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Isometric Multi-Manifolds Learning 



Abstract — Isometric feature mapping (Isomap) is a promis- 
ing manifold learning method. However, Isomap fails to work 
on data which distribute on clusters in a single manifold or 
manifolds. Many works have been done on extending Isomap 
to multi-manifolds learning. In this paper, we proposed a new 
multi-manifolds learning algorithm (M-Isomap) with the help 
of a general procedure. The new algorithm preserves intra- 
manifold geodesies and multiple inter-manifolds edges faithfully. 
Compared with previous approaches, this algorithm can isomet- 
rically learn data distribute on several manifolds. Some revisions 
have been made on the original multi-cluster manifold learning 
algorithm called D-C Isomap [24] such that the revised D-C 
Isomap can learn multi-manifolds data. Finally, the features and 
^ effectiveness of the proposed multi-manifolds learning algorithms 
are demonstrated and compared through experiments. 

Index Terms — Isomap, nonlinear dimensionality reduction, 
manifold learning, pattern analysis, multi-manifolds learning. 
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I. Introduction 

. Challenges, known as "the curse of dimensionality", are usu- 
ally confronted when scientists are doing researches on high 
dimensional data. Dimensionality reduction is a promising tool to 
circumvent these problems. Principal component analysis (PCA) 
[1] and multidimensional scaling (MDS) [2] are two important 
linear dimensionality reduction methods. Due to their linear 
model assumption, both of the methods will fail to discover 
nonlinear intrinsic structure of data. 

■ Recently, there are more and more interests in nonlinear di- 
mensionality reduction (NLDR). NLDR is used to learn nonlinear 
intrinsic structure of data, which is considered to be the first step 
of "machine learning and pattern recognition: observe and explore 
the phenomena" [3]. Two interesting nonlinear dimensionality 
reduction methods based on the notion of manifold learning 
[6], isometric feature mapping (Isomap) [4] and local linear 
embedding (LLE) [5], have been introduced in SCIENCE 2000. 
LLE assumes that data points locally distribute on a linear 
patch of a manifold. It preserves local linear coefficients, which 
best reconstruct each data point by its neighbors, into a lower 
dimensional space. Isomap is based on classical MDS method. 
Instead of preserving pairwise Euclidean distance, it preserves 
geodesic distance on the manifold. The geodesic between two 
data points is approximated by the shortest path on a constructed 
graph. Both of the methods are computational efficient and able 
to achieves global optimality. Besides, there are many other 
important nonlinear dimensionality reduction methods. Laplacian 
eigenmap [7] utilizes the approximation of the Laplace-Beltrami 
operator on manifold to provide an optimal embedding. Hessian 
LLE [8] resembles Laplacian eigenmap by using the approxi- 
mation of Hessian operator instead of Laplacian operator. Local 
tangent space alignment(LTSA) [9] method learns local geometry 
by constructing a local tangent space of each data point and 
then aligns these local tangent spaces into a single global coordi- 
nates system with respect to the underlying manifold. Diffusion 
maps [10] applies diffusion semigroups to produce multiscale 
geometries to represent complex structure. Riemannian manifold 
learning (RML) [11] method uses the constructed Riemannian 



normal coordinate chart to map the input data into a lower 
dimensional space. NLDR is fast developed and has been proved 
very useful in many fields and applications, such as classification 
using Isomap [16] and laplacian eigenmap [17], geometric based 
semi-supervised learning method using laplacian eigenmap [18], 
data visualization [19], time series manifold learning [20], [21] 
and so on. 

As Isomap emphasizes on the global geometric relationship of 
data points, it is very illustrative in data visualization and pattern 
analysis [13]. Although Isomap algorithm implicitly requires the 
data set is convex, it still demonstrates very meaningful results 
on non convex data sets. Therefore, in this paper, we will focus 
attention on extending Isomap to multi-manifolds learning. The 
first step of Isomap algorithm is to construct a neighborhood 
graph which connects all the data points. This step is of vital 
importance because the success of following steps depend on how 
well the constructed neighborhood graph is. However, it is hard to 
build a totally connected neighborhood graph and guarantee the 
topological stability of classical Isomap algorithm when points 
of the data set distribute on clusters in a manifold or manifolds 
(multiple manifolds). Many works have been done on extending 
Isomap to multi-manifolds data. Some methods try to do this 
by providing new neighborhood graph construction algorithms. 
Yiming Wu et al [23] introduced a split and augment procedure 
for neighborhood graph construction which could produce a 
totally connected neighborhood graph. Li Yang [26]-[29] intro- 
duced several neighborhood graph construction algorithms using 
techniques from discrete mathematics, graph theory. Deyu Meng 
et al [24] proposed a decomposition and composition Isomap (D- 
C Isomap). 

The rest of the paper is organized as follows: In Section 
II, main issues and limitations of classical Isomap algorithm 
are presented. The problem of multi-manifolds learning is also 
investigated. In Section III, previous methods on multi-manifolds 
learning are briefly introduced and discussed. In Section IV, a 
general procedure for designing multi-manifolds learning algo- 
rithms is first proposed. With the proposed procedure, a new 
multi-manifolds learning algorithm (M-Isomap) is designed and 
analyzed. With some revisions on the original algorithm, the main 
limitations of D-C Isomap are resolved. Finally, in Section V, 
the effectiveness of these multi-manifolds learning algorithms 
has been demonstrated by experiments. Comparisons of these 
algorithms have also been made. 

II. Classical Isometric Feature Mapping and Its Limitations 

Isomap is able to recover the intrinsic geometric structure 
and converge as the number of data points increases [4] if 
data lie on a manifold, . Like PCA and MDS, Isomap has the 
advantage of simple implementation and computational efficiency. 
The algorithm also guarantees a globally optimal solution. 

It is assumed that data set X = {xi,X2,--- ,Xf^] is in high 
dimensional space and the feature space is R'^ . The classical 
Isomap algorithm has three steps. 

Step 1: Identify the neighbors for all the data points to 
construct a neighborhood graph. With the given parameter 
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k or e, there are two ways to construct a neighborhood graph 

forX: 

• if Xj is one of x,'s k nearest neighbors, they are 
connected by an edge (the k-NN method). 

• Xi and Xj satisfy \\Xi - Xj\\ < e, they are connected by 
an edge (the s-NN method). 

Step 2: Use Dijkstra's or Floyd-Warshall algorithm to 
compute the length of shortest path da{Xi,Xj) between any 
couple of data points x, and xj on the graph. It is proved 
that dc(Xi, Xj) is a good approximation of geodesic distance 
dM(Xi,Xj) on the manifold as the number of data points 
increases. 

Step 3: Perform classical MDS on the graph distance matrix 
Do whose (i,j)-th element is ddi,])- Minimize a cost 
function 

E(Y) = \\T{DG)-r{DY)\\F, 

where H = 



zee' , S = (Dfj), I is the identity matrix and e 



The operator t is defined as t(D} 
J _ l„^T (J _ rn2 

(1, 1, • • • , 1)^ . Dy = (Wyi-yjW)- Assuming that, in decreasing 
order, /I, is the ;-th eigenvalue of t{Dg) and v, is the 
corresponding eigenvector to Ai, then the low dimensional 
embedding Y is given by: 



y = \yu,y2. 



,yn\ = 



The properties of Isomap algorithm are well understood [14] 
[12]. However, the success of Isomap algorithm depends on 
two issues. One issue is how to choose the correct intrinsic 
dimensionality d. Setting a lower dimensionality d will lead to a 
loss of data structure information. On the other hand, setting a 
higher dimensionality d will cause that redundant information is 
kept. This issue has been well investigated. The other issue is the 
quality of the constructed neighborhood graph. It is known that the 
problem on neighborhood graph construction is still a tricky one. 
Both the k-NN and e-NN methods have their limitations. Under 
the assumption that data points distribute on a single manifold, if 
the neighborhood size /: or e is chosen too small, the constructed 
neighborhood graph will be very sparse. Thus geodesies can not 
be satisfyingly approximated. Otherwise, if neighborhood size 
A: or e is chosen too large to cause short-circuit edges, these 
edges will have a significant negative influence on the topological 
stability of Isomap algorithm [22]. 

Nonetheless, it is a relative simpler problem if data points 
distribute uniformly on one manifold. Both the "short circuit" 
and "discontinuity" problem can be circumvented by carefully 
choosing an appropriate neighborhood size k or e. It is a diff^erent 
problem if the data distribute on clusters or manifolds. k-NN or 
e-NN do not guarantee that the whole data set is totally connected 
and the quality of approximated geodesies. 

In real world, "data missing" and "data mixture" are common 
problems in data analysis. Under manifold assumption, these 
two problems cause that data distribute on diff'erent clusters in 
a manifold or manifolds. Here, the main problems of multi- 
manifolds learning are presented, the data may have these proper- 
ties: First, data points on difl'erent manifolds may have different 
input dimensionality D (dimensionality of the ambient space). 
This usually happens in "data mixture" cases. Second, learning 



different data manifolds may need different value of input pa- 
rameters, i.e. appropriate neighborhood size {k or e) and intrinsic 
dimensionality d for each data manifold. The case when data 
points distribute on pieces of a single manifold is referred to as 
multi-cluster manifold learning; meanwhile, the case when data 
points distribute on multiple manifolds is referred to as multi- 
manifolds learning. In this paper, we will concentrate on designing 
multi-manifolds learning algorithms to data with these properties. 

III. Ftavious Works on Multi-Manifolds Learning 

A. Multi-manifolds learning by new neighborhood graph con- 
struction method 

Wu and Chan [23] proposed a split-augment approach to 
construct a neighborhood graph. Their method can be regarded 
as a variation of the k-NN method and can be summarized as 
below: 

1. k-NN method is applied to the data set. Every data point is 
connected with its neighbors. If the data lies on multiple 
manifolds, several discormected graph components (data 
manifolds) will be formed. 

2. Each couple of graph components are connected by then- 
nearest couple of inter-components points. 

This method is simple to implement and has the same com- 
putational complexity as k-NN method. However, as there is 
only one edge connecting every two graph components, geodesies 
across components are poorly approximated; meanwhile, their low 
dimensional embedding can be rotated arbitrarily. This method 
can not be directly applied to data lying on three or more 
data manifolds. If more than two graph components exist, intra- 
component shortest distances may be changed in the totally 
connected graph. 

Li [26]-[29] introduced four methods to construct a connected 
neighborhood graph. The k minimum spanning trees (k-MST) 
[26] method repeatly extracts k minimum spanning trees (MSTs) 
from the complete Euclidean graph of all data points. Edges of 
the k MSTs form a k-connected neighborhood graph. Instead 
of extracting k MSTs, the minimum-k-spanning trees (min-k- 
ST) [27] method finds k edge-disjoint spanning trees from the 
complete Euclidean graph, and the sum of the total edge length 
of the k edge-disjoint spanning trees attains a minimum. The k- 
edge-connected (k-EC) [28] method constructs a connected neigh- 
borhood graph by adding edges in a non-increasing order from the 
complete EucUdean graph. An edge is added if its two end vertices 
do not have k edge-disjoint paths connected with each other. The 
k-vertices-connected (k-VC) [29] method add edges in a non- 
increasing order from the complete Euclidean graph, an edge is 
added if its two end vertices would be disconnected by removing 
some k - I vertices. Finally, the constructed neighborhood graph 
would not be disconnected by removing any k-l vertices. 

The methods introduced in [26]-[29] advantage over k-NN 
method for two reasons: First, the local neighbor relationship 
is affected by the global distribution of data points. This is 
beneficial for adaptively preservation of the global geometric 
metrics. Second, these methods could guarantee that the con- 
structed neighborhood graph is totally connected. Compared with 
k-NN method, Li's methods construct a neighborhood graph with 
more edges corresponding to the same neighborhood size k. This 
property can assure the quality of the neighborhood graphs. 
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B. Multi-manifolds learning by decomposition-composition 

Isomap 

In [24], Meng et al. proposed a decomposition-composition 
method (D-C Isomap) which extends Isomap to multi-cluster 
manifold learning. The purpose of their method is to preserve 
intra-cluster and inter-cluster distances separately. Because a 
revised version of D-C Isomap will be introduced in the next 
section, we present the details of D-C Isomap algorithm in the 
following 

Step I: decomposition process 

1. Given neighborhood size k or e, if the data is of multi- 
cluster, several discormected graph components can be iden- 
tified when every data point is connected with its neighbors 
by k-NN or s-NN. 

2. Assuming that there are M components, the w-th compo- 



nent is also denoted as a cluster 



Clusters 



X™ and X" are connected by their nearest inter-cluster data 
points n:)^ and nx!^ whose pairwise distance is assumed to 
be <„. 

3. Apply k-NN Isomap or e-NN Isomap on each cluster X". 
Denote the geodesic distance matrix for as D'" = {D^ ), 
the corresponding low dimensional embedding as F™ = 
{Yi' ■ ■ ■ > }' embedding point corresponding to n:i^ 

as ny^, where ny^ £ V" . 

Step II; composition process 

1. The set of centers of clusters is denoted as CX = 
{cxi, ■■■ , cxm), where every center is computed by 



cxm = argmin 



m = 1, • • • ,M. 
2. The distance matrix for CX can be computed by 



D = {D^. 



0, 



m = n 



where d„ „ is the distance of shortest path between cx,„ and 
nx'" on the graph component X™. 
3. Plug distance matrix D and neighborhood size d into clas- 
sical Isomap algorithm. The embedding of CX is denoted 
by CY = [cyi, • • • , c>'m) c (CY is called the translation 
reference set). Assuming that the d nearest neighbors of cx^ 
are {cx^j,--- ,cx„J, the low dimensional representation 
corresponding to nx^. is computed as 



cy„-\-' 



-icy„. - cy„) 



i = I, ■ ■ ■ ,d, m = I, - ■ ■ , M. 

4. Construct the rotation matrix J?l,„ for 7'",m = ,M. 
Assuming that QN^ is the principal component matrix of 
NY^ = {ny^i, ■ ■ ■ ,ny'^J and QS „ is the principal compo- 
nent matrix of SY„ = {sy2ir ■ ■ , I' then the rotation 
matrix for Y"" is Jl„, = QsZqnI- 

5. Transform Y"',m = l,---,M into a single coordinate 
system by Euclidean transformations 

Fy'" = {/)f = :?l„3;f + cy„,? =!,••• ,U, m=l,---,M. 
Then Y = U^=i FY^ is the final output. 



Firstly, the D-C Isomap reduces the dimensionality of clusters 
separately; meanwhile, it preserves a skeleton of the whole data. 
Secondly, using Euclidean transformations, embedding of each 
cluster is placed into the corresponding position by referring 
to the skeleton. In this way, intra-cluster geodesies are exactly 
preserved. As D-C Isomap method uses circumcenters to construct 
the skeleton of whole data, its learning results unstably depend 
on the mutual position of these circumcenters. It is known that at 
least d+l reference points are needed to anchor a cZ-dimensional 
simplex. However, in D-C Isomap algorithm, the number of the 
reference data points is limited by the number of clusters. 

C. Constrained Maximum Variance Mapping 

There is also a newly proposed algorithm called constrained 
maximum variance mapping (CMVM) [25] for multi-manifolds 
learning. CMVM method is proposed on the notion of maximizing 
dissimilarities between classes while holding up the intra-class 
similarity. 

IV. Isometric Multi-Manifolds Learning 
A. The general procedure for isometric multi-manifolds learning 

Many previous methods extend Isomap for multi-manifolds 
learning by revising the neighborhood graph construction step 
of the Isomap algorithm [23], [26]-[29]. However, shortest paths 
across clusters or data manifolds are bad approximations of 
geodesies. In Isomap, bad local approximation always leads to 
the deformation of global low dimensional embedding. 

Under the continuous assumption, it is assumed that D is an 
open, convex and compact set in R'', and / : D — » , d « D 
is a continues mapping, /(fl) = Ai is defined as & d dimen- 
sional parameterized manifold. K{x,y) (x,y e M) is a specially 
defined kernel and a reproducing kernel Hilbert space (RKHS) 
is constructed with this kernel. <^j{x) is the eigenfunction 
corresponding to the j-th largest eigenvalue Aj of K{x,y) in .3^, 
which is also the y'-th element of Isomap embedding. The geodesic 
distance on manifold M is written as (f(x,y) = <f(J{T), fir)) = 
a\\T - t\\ + t]{t,t), where t, t e f2, a is a constant and /7(t,t) is 
the deviation from isometry. The constant vector 

^ J^xp{x)dx _ J^-^rH(T)dT 

" lMP(^)dx " i/K(T)dr 

where p{x) and 'H(t) are density functions of M and Q. With 
the assumptions above, the following theorem is proved by Zha 
et al [12]. 

Theorem 4.1: There is a constant vector P, such that <^j{x) = 
Pj {t - C) + ej{T), where e/r) = ey - ej{r) has zero mean, i.e., 

J^'H(T)ejiT)dT = 0, with ej(T) = ^ J^T](r,r)'Hir)<^jix)dr, ef = 
j^/j^ej(rmr)dr. 

By theorem 4.1, even if the deviation r](T,T) is not zero with 
only a limited range of (t,t). The coordinate of the low dimen- 
sional embedding (pj(x) is still deformed, and the deformation is 
measured by e/r). 

In order to get a better vmderstanding of multi-manifolds data, 
it is profitable to preserve intra-manifold relationship (where 
77(7,7) = 0) and inter-manifolds relationship (where r](T,T) ?t 0) 
separately. This is because that sometimes we care more about 
the information within the same data manifold. Here we propose 



4 



TABLE I 

Symbols and Variables used in the Algorithms 



X = The total data set, with Xi e 

X'" = {jcf The m-th data manifold, where m = 1, • • • , M 

Y"' = {y'"]^^\ Low dimensional embedding of X'" 

Dmn = (di(i, i)) Matrix of geodesic distances across data manifolds X'" and X" 

fx^, fx"^ The furthest couple of data points between X'" and X" 

{■"C(i)l;=i Subset of X'", data points of which are the nearest ones to X" 

!>" = The selected data points from X'" to construct the skeleton / 

^ = U™=i Points which are used to construct a skeleton I of X 

D[ Approximated geodesic distance matrix for skeleton 7 

RYi Low dimensional embedding of the skeleton / 

RY"' c RYi Low dimensional embedding of /"', which will also be referred 

as the transform reference for F"' 

nx'j The point in X' which is the nearest to X', or the inter-cluster point 

in D-C Isomap algorithm. 



a general procedure for isometric multi-manifolds learning algo- 
rithms. 

Step L The decomposition process 

1. Cluster the whole data set. If data distribute on different 
clusters in a manifold or manifolds, the clusters or mani- 
folds should be identified. Many clustering method could 
be used, such as K- means, Isodata and methods introduced 
in [15] [33]. Even if the manifolds overlay with each other, 
they can still be identified and clustered [39]. At this step, 
data set X is clustered into several components and every 
component is considered as a data manifold. 

2. Estimate parameters of data manifolds. For intrinsic di- 
mensionality estimation, many methods can be used: the 
fractal based method [34], MLE method [35], [36], and the 
incising ball method [37]. Assume that d,,, is the intrinsic 

dimensionality of the m-th data manifold. Let d = max J,,,. 

Ill 

For neighborhood size, [32] introduces a method on auto- 
matically generating parameters for Isomap on one single 
data manifold. For convenience, appropriate neighborhood 
sizes {k,„ or e,„ for X'") could be given manually for data 
manifolds. 

3. Learn the data manifolds individually. One data manifold 
can be learned by traditional manifold learning algorithms. 
Here, we propose to rebuild a graph on each data manifold 
with new neighborhood size to better approximate intra- 
manifold geodesies. Methods of Li's works [26]-[29] or 
the k-CC method is preferred, where the k-CC graph 
construction method will be described later. It is assumed 
that the low dimensional embedding for X'" is Y"' . 

Step II: The composition process 

1. Preserve a skeleton / of the whole data set in low dimen- 
sional space . The skeleton / should be carefully designed 
such that it can represent the global structure of X. Let RYi 
be the low dimensional embedding of I. 

2. Transform F'"s into a single coordinate system by referring 
to 7. In order to faithfully preserve intra-manifold rela- 
tionship, Euclidean transformations could be constructed 
and used. Using embedding points RY'" c RY' and cor- 
responding points from Y"\ we can construct an Euclidean 
transformation from F'" to the coordinate system of 7. 

Although the idea about decomposition-composition is not new. 



TABLE II 

Computational complexity comparison of k-NN, k-MSTs, Min-k-ST, k-EC 
and k-vc. tc stands for time complexity and il stands for the time 
complexity for incremental learning 

k-NN k-MST Ming-k-ST k-EC k-VC 
TC 0(kN-) (XFW) (XFW) 0(J?W) O(N^) 
IL O(kN) 0(N\ogN) 0(N\ogN + kN) 



which is first used by Wu et al. [23] in their split-augment process 
and well developed and used in [24]. The procedure we proposed 
here aims to solve a more general problem. Step I.l permits that 
the designed learning method has a good ability to identify data 
manifolds. Step 1.2 gives a guideline on learning manifolds with 
different intrinsic dimensionality and neighborhood sizes. Step 
1.3 learns data manifolds individually so that the intra-manifold 
relationship can be faithfully preserved. Step II. 1 is the most 
flexible part of the procedure which allows us to design new multi- 
manifolds learning algorithms. A well designed skeleton 7 could 
better represent the inter-manifolds relationship. In the following, 
we will introduce a new multi-manifolds learning algorithm and 
revise the original D-C Isomap algorithm with the help of this 
general procedure. 

B. A new algorithm for isometric multi-manifolds learning 

Based on the proposed procedure, we designed a new multi- 
manifolds learning algorithm. As an extension of Isomap method 
for multi-manifolds data, the method will be referred to as multi- 
manifolds Isomap (M-Isomap). It is assumed that X is also 
interchangeable to represent the matrix [xi,X2,--- ,xt^'\, where 
{xi, i = 1, • • • , A') are column vectors in R^. 

1 ) Using k-CC method to construct a neighborhood graph and 
identify manifolds: Table |ll] shows the time complexity of k- 
NN, K-Min-ST, k-EC and k-VC methods on neighborhood graph 
construction. As it is shown in the table, k-NN method has the 
lowest computational complexity O(kN^). 

For incremental learning, the computational complexity of k- 
NN, k-MSTs and k-VC are 0{kN), 0{N log N) and 0(N log N-\- 
kN) respectively [30] [31]. The computational complexity of Min- 
k-ST, k-EC methods for incremental learning are unavailable. For 
data on one single data manifold, the improvement of performance 
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of Li's methods becomes insignificant when the neighborhood 
size k for k-NN method increases. More importantly, k-NN 
implicitly has the property of clustering to multi-manifolds data. 
Data points of the same manifold tend to be connected by paths 
and disconnected otherwise when every data point is connected 
with its neighbors by edges. Although k-NN is not a robust 
clustering algorithm, it is computational efficient for both cluster- 
ing and graph construction. Therefore, we introduce a variation 
of k-NN method which inherits computational advantage of k- 
NN method. The method is also able to identify data manifolds 
and construct a totally connected neighborhood graph. In the 
rest of the paper, the proposed neighborhood graph construction 
method will be referred as k-edge connected components (the 
k-CC method). 

The summary of k-CC algorithm: First, given a neighborhood 
size k or e, every data point is connected with its neighbors. 
If the data points distribute on several clusters or manifolds, 
several disconnected graphs will be constructed. Data points are 
assigned to the same data manifold if there is a path connects 
them on the graphs. Then, we connect each pair of graphs by 
k nearest pairs of data points. Concerning about robustness of 
the algorithm, every data point is only allowed to have one 
inter-manifolds edge at most. 

Algorithm:(k-CC method) 

Input: Euclidean distance matrix D, whose {i,j)-th entry is - 
Xj\\. Neighborhood size k or e. 

Output: Graph G = {V, E), number of clusters M, label of the 
data. 



which identify components (data manifolds) and connect different 
components of the graph. This change makes the constructed 
graph totally k-edge connected. Compared with the method 
proposed in [23], k-CC method constructs a neighborhood graph 
with k inter-manifolds edges, which is able to control the rotation 
of the embedding of data manifolds. In Section V, the method, 
which uses k-CC to construct a totally connected graph and then 
perform classical Isomap on the graph, will be referred to as k- 
CC Isomap. It can be easily inferred that k-CC Isomap suffers 
the limitation which has been shown by Theorem 4.1. 

At this step, it is assumed that X is clustered into data manifolds 
and \x"„\i^]';^i is the subset of X'" whose data points 
connect with X"^'\ i = 1, ■ • • ,k. 

2) Learn data manifolds individually: As X'" is considered as 
a single data manifold in R^, it is possible to find its intrinsic 
parameters. The incising ball method [37] is utilized to estimate 
the intrinsic dimensionality, which is simple to implement and 
always outputs an integer result. Assume that d is the highest 
intrinsic dimensionality of data manifolds. Neighborhood size k,„ 
or £,„ of each data manifold is given manually and the graph on 
data manifold X'" is rebuilt. It is hoped that the new neighborhood 
graph on X'" can make better approximations of intra-manifold 
geodesies. The approximated geodesic distance matrix for X"' 
is written as D,„. By applying classical MDS on Z),„ , the low 
dimensional embedding for X'" can be got as Y"' = {yflfjj. 

3) Preserve a skeleton of data manifold X: First, inter- 
manifolds distances are computed. Assuming that x'j' and are 
any data points with x'^ e X'" and x'^ e X" , their distance can be 
computed by 



Initialization: V = {xi. 



, x„}, V = V, E = Queue ■ 



9: 
10: 
11: 
12: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 



23 
24 
25 



for i=I to do 

Identify nearest neighbors [xn, - ■ ■ , xu. | for x, by A:-nearest- 
neighbors or e-nearest-neighbors. Let E = E [jien, ■ ■ ■ e,/.) 

end for 

Set M = 1 

while {V is not empty) do 
X e V, in-Queue{xl, label(x)=M, V = V - {x} 
while {Queue is not empty) do 
x=de-Queue 

Vy: y is connected with x by an edge 
if ly is not labeled) do 

in-Queue|y), label(y)=M, V = V - [y] 
end if 
end while 
M = M+l 
end while 
M = M- 1 
if ( M > 2 ) 
k =average({/i)J!j) 
for i = 1 to M do 
for j = i + I to M do 

Find k shortest inter-manifolds edges ei , • • ■ , ca- between 
data manifolds i and j and make sure that their ending 
vertices are not identical. Let E = E {j{ei, ■ • • , ek] 
end for 
end for 
end if 

The main difference between k-NN and k-CC is lines (4-25), 



d(x^, x"^) = min{J,„(x;, <(,,) + IK-^,), x"^,,^\\ + xp), (1) 



where d{x"p, x|"^,)) is the shortest path on the neighborhood graph 
of X'". Although £/(x'^', yjj^j) may not be the shortest path on 
the totally connected graph of X, Eq. (T]! is an efficient way to 
approximate distances across manifolds. D„,„ is assumed to be 
the distance matrix across over X™ and X". The furthest inter- 
manifolds data points are computed by 



{/<, K,) = arg max d{x^, xj), di^, ^l) e D„, 



Without lose of generality, we assume /" 



(2) 



U=ii<'(i)'--- ^Kikyf^^ ■ t = U;„=i/"' is considered as the 
global skeleton of X. On the data manifold X, it can be seen 
that the skeleton / formulates a sparse graph. We assume that 
Dj = {d[(i, j)) is the distance matrix of /, where 



d,{ij) 



d(x^,x"j)eD,„„, XiEX'^xjeX" 
d(x^,xr^)eD,„, Xi,XjeX"' 



(3) 



By applying classical MDS algorithm on /)/, the low dimen- 
sional embedding of / can be got as RYi. It is assumed that 
RY'^ c RYi is the embedding of /'" and RY'," = (ry™)!™;. 

4) Euclidean transformations : Assuming that Y"' = {y"']'|!l^ C 
y™ and y™ corresponds to x"", an Euclidean transformation from 
Y'^ to RY'I' could be constructed. 



The general Euclidean transformation can be written as 



where is an orthonormal matrix and jS is a position transla- 
tion vector. For the m-th data manifold, it is assumed that the 
Euclidean transformation is 



Generally, it could be written in form of matrix 



(4) 



where e is a vector with all ones. Problem (O can be solved by 
the least square strategy, and the solution is 



(5) 



where / is the identity matrix and A is the regularization parameter 
in case singular. However, least square solution does not provides 
an orthonormal matrix J?l„, . Here we propose to use the orthonor- 
mal matrix which is computed by QR decomposition. The QR 
process can be written as 



(^„. r) = qr{:r„,). 



(6) 



with the diagonal elements of R to be forced non negative. Then 
P,„ can be recomputed by minimizing a cost function 



C06J =Yj\\^'ny7 +Pm-ry'7\? 



By taking derivative = o, we have 



Input: X = {^,1^1 with Xi e R". Initial neighborhood 
size k or e. 

Step 1.1 Perform k-CC on X. Data manifolds {X'"]'^^^^ 
and the set of inter-manifolds points 
of X"' can be obtained. 

Step 1.2 Estimate parameters of data manifolds. It is 
assumed that intrinsic dimensionality d„, and 
neighborhood size (k,„ or £„,) are parameters for 
X"' . Let d = max{d,„] and rebuild neighborhood 

m 

graph on X'". 

Step 1.3 Classical Isomap is performed on the new graphs 
superimposed on X'", m = 1, ■ ■ • ,M. The 
corresponding low dimensional embedding of X'" 
is denoted as Y'". 

Step n.i 



step II.2 



Inter-manifolds distance matrix D,„„ is computed 
by Eq. ([TJ, thus {fx"„'}^,^„ can be found by Eq. (O. 
Distance matrix D[ for the skeleton / is computed 
by Eq. ((Sjl. Applying classical MDS on Dj, we 
denote the low dimensional embedding of / as RYi. 
RY'" c RYi and RY'" is assumed to be the embedding 
of /'". 

Construct Euclidean transformations by Eq. iSllj . 
Using the Euclidean transformations, it is assumed 
that Y'", m = 1, • • • , M are transformed to RY'", 
m = 1, • • • ,M. 

Step II.3 Y = U^^i RY'" is the final output. 

C. Computational complexity of M-Isomap method 

Computational complexity is a basic issue for application. For 
M-Isomap method, k-CC method needs 0{{k + \)N^) time to 
construct a totally connected graph and identify the manifolds. 
Computing the shortest path on every data manifold needs 
OiXim=\Sj„^ogS ,„) time, and performing classical MDS on the 
distance matrixes of data manifolds needs 0{Y!^^\ S^„) time. The 
time complexity of computing the shortest path across data man- 
ifolds is Oi'ZmKnkS „iS n) and finding fx'^Jx'. is 0(Zm<„ 5'„,5„). 
Performing classical MDS on skeleton / needs 0{(Yim=i Im f) 
computational time. The time complexity of least square solution 
and QR decomposition process for M data manifolds is 0{Md^). 
Finally, transforming F"s into a single coordinate system needs 
O(d^N) computational time. 

Therefore, the total time complexity of the M-Isomap method is 



0{{k + l)N^ + YjiSl +SilogS,„) + Y,{k+ l)S,„S„ 

m=l m<n 
M 

+(^ Imf + Md^ + d^N) 



P,n = ^J](ry';'-:n,„yn 



(7) 



For a large data set when N » M and N » d, the overall time 
complexity of M-Isomap can be approximated by 



Low dimensional embedding Y,„, i = 1, • • • ,M could be trans- 
lated into a global coordinate system by the constructed Euclidean 
transformations. 

5) The complete algorithm of M-Isomap: To give an compact 
presentation of M-Isomap, the algorithm is summarized in the 
following table. 



0{{k + \)N^ + Yj^sI + si logSJ + Yjik + l)S,nS„) 



D. The revised D-C Isomap method 

D-C Isomap applies the decomposition-composition procedure. 
Therefore, it is able to preserve intra-cluster distances faithfully. 
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(a) VxJ 



Fig. 1 



(b) 




Two BASIC CASES OF THE RELAnONSHIP OF THE CENTER AND ESITER-MANIFOLDS POINTS 



Fig. 2 

An ILLUSTRATION OF HOW TO ADD A NEW CLUSTER FOR D-C ISOMAP ALGORITHM. 



However, this method suffers from several limitations. In the 
following, revisions will be made on the original D-C Isomap 
algorithm to overcome its limitations. 

7 ) Selection of centers: D-C Isomap implicitly assumes that 
the inter-cluster point nx^ is in the line which connects centers Om 
and nxJJ,. Thus it is more sensible that 0„, is chosen by referring 
to the inter-cluster points. Fig. 1 illustrates two basic cases about 
the relationship of the center and inter-cluster points. Although 
the points nx}, n.JCj, wx^ nx\ and 0\ do not have to really lie 
on the same plane in the ambient space. It is assumed that these 
points formulate a triangle in the low dimensional space. Fig. 1(a) 
shows the case when Lnx\n}?^nx[ + lnx\nx[n:^^ < 180°. 

In triangle AOinx^nx^, the edge d{ny?i,nx\) can be computed 
as llnxj - nxjil. We also have 



lOinx^nxl = arc cos 
lOinx^nx^ = arc cos 



< nx\ 



■ nx\,nx 



J nx\ > 



< nx. 



d„ distances d'(Oi,nxj), i = 1, • • • , (i„ are needed to estimate the 
position of center Oi, and in this case f(o) is given by 



fio) = Yj\\d(Ounxl)-d'(Ounxl) 



If we can not find sufficient d'(Ounxj)s to locate the center, 
there must be many inter-cluster points located in space as 
illustrated in Fig. 1(b). In this case, we have lnx\nx\nx\ + 
ZnxjHXjMXj > 180°, when the center 0\ could never be in the 
line of nx\nx\ and nx\nx\. In order to get a better preservation 
of inter-cluster relationship, 0\ should be placed as far as pos- 
sible from these inter-cluster points. For a cluster with intrinsic 
dimensionality 2, it is suggested that 0\ should be chosen by 



Oi = argmax{g(o)} 



(9) 



where 



Subsequently, the length of edges d{Oi , nx\) and d{Oi , «Xj) can be 
calculated by the Law of Sines in AOinx^nx^. Suggested distances 
between center to inter-cluster points can be calculated as 



d'{Ox,nx\) 
d'{0\,nx\) 



d(Oi,n>^) ■ 
d(Oi,nxl) ■ 



\\nxl ■ 



For a cluster with intrinsic dimensionality 2, it is sufficient to 
estimate position of Oi in the cluster by solving the following 
optimization problem: 



Oi = arg min/(o) 



(8) 



where 



f{o) = Y,\\d{Ounxl)-d'(Ounxl) 



1=1 



Here d(Oi , nx?) is the length of shortest path between Oi and nx\ 
on graph . For a cluster with intrinsic dimensionality dm, at least 



g(o) = d(o,nx\) + d{o,nx\) - \\d(o,nx\) - d{o,nx\)\\ 

If the intrinsic dimensionality of is d„, and [nx], i = 1, • • • ,d,„} 
is the set of inter-cluster points in , the function g(o) should be: 

d,„ 

g(o) = ^ (d(o, nx] ) + d{o, «xj) - \\d{o, nx]) - d(o, nxj)||) 
'<J 

2) Degenerative and unworkable cases: As original D-C 
Isomap algorithm relies on the position of centers to preserve 
inter-cluster relationship, the algorithm can not work on data 
under some circumstances. Considering about a simple case of 
two data clusters with d = 2, the method does not work because 
that it implicitly requires an another data cluster to provide 
sufficient rotation reference data points. Because that the low 
dimensional embedding of each clusters are relocated by referring 
to position of the centers. When there are three or more clusters 
and the centers of them are nearly in a line, the original D-C 
Isomap can not find the exact rotation matrix. 

Therefore, we propose an algorithm to solve the problems 
by adding new clusters. This algorithm applies a trial and error 
procedure to determine the position of the new clusters. In the 
following, the case about two clusters is used as an example. 



As shown in Fig. 2, the nearest couple of inter-cluster points of Input: 



clusters and are assumed to be nx 



and 



mi is the 

middle point of nx\ and nxy The second nearest couple of inter- 
cluster points are njCj and nx^. 1112 is the middle point of them. 
Then the third cluster X^ is suggested to be produced by 



X' 



■ nil 



■y\\nx\ 



7,, nil - mi 



Wm - mill 



The parameter y can be decided by a trial and error procedure. 
Given a positive value > 1, it is assumed that X^ should satisfies 



1 



\\x'-x'\\ 



(10) 



where is the shortest distance between data points from 

clusters and X^ . If condition dlOb is not satisfied, y changes in a 
pre-given range like {• • • , -3, -2, -1, - -i, • • • , i, 5, 1, 2, • • • ). 

When there are M clusters in the data set with M < d + \, 
we can start from the couple of clusters with maximum nearest 
inter-cluster distance. Assume that X' and X^ satisfy 



\\X^ -X-\\= maxminllX' -X-'ll 



and nx\, nx^, mi, nx^, nx^, m2 are defined as above. The Mh 
cluster x^'+i could be generated as 



1-th 



X = Uil^j, with Xi e : 
size k or e. 



Initial neighborhood 



Step I.l The same to Step I.l of the original D-C 
Isomap algorithm. 

Step 1.2 Estimate parameters, intrinsic dimensionality 
{dm]^=i and neighborhood sizes {{k,„}1f^^^ or 
{£hi1'^=i), of clusters. Let d = rnaxd,,, and 

in 

rebuild neighborhood graphs on clusters. If 
M < d + I, new clusters should be added until 
M>d+\. 

Step 1.3-4 The same to Step 1.2-3 of the original D-C 

Isomap algorithm. 
Step II. 1 Centers of clusters are computed by ^ or 

l|9ll. New clusters should be added until centers 

could anchor a d dimensional simplex. 
Step II.2-4 The same as Step II.2-4 of the original D-C 

Isomap. It is assumed that Y'" is transformed 

to TY'". 

Step II.5 Y = U,^.i TY'" is the final output. 



V. Experiments 



A. 3-D data sets 



In this subsection, we compare k-CC Isomap, M-Isomap and 
the revised D-C Isomap on three 3-D data sets. It should be noted 
that during all experiments, the size of the neighborhood is chosen 
corresponding to the best performance of each algorithm. 

Fig. 3(a) is a two-manifolds data set with A' = 1200 data 
points, and the data set is generated by the following matlab code: 

t=(l*pi/6)*(l+2*rand(l,N)) ; 
xx=t. "cos(t) ;yy=t.*sin(t) ; 

zz =[unifrnd(l, 10,1, N/2) unifrnd(16,25, l,N/2)] ; 
X=[xx;zz;yy] ; 



X" 



- mi + yllnJCj — nx^ 



m2 — mi 
Wm - mil 



If XP and XI are the two nearest clusters of X*^"^', given yS > 0, 
it is assumed that x*'"^' should satisfies 



||X* 



■XP\\ 



1 



/5 \\X'^+^-Xi\\ 



<P. 



IfM-nl < d + \, replace M hy M + \ and repeat the generating 
procedure presented above. 

PCA can be used to find out the dimensionality of the subspace 
on which the centers are lying. A new cluster is also needed if 
the dimensionality of the subspace is lower than d. New clusters 
should be added until the centers could anchor a d dimensional 
simplex. 

3) The complete algorithm of the revised D-C Isomap: 
To give a compact representation of the revised D-C Isomap 
algorithm, and compare the difi^erence between the revised and 
original D-C Isomap algorithm, the integral revised D-C Isomap 
algorithm is presented as bellow: 



It can be seen that each data manifold is intrinsically a rectan- 
gular region with 600 data points. Fig. 3(b) shows the result got by 
k-CC Isomap, whose neighborhood graph is constructed by using 
8-CC method. It can be seen that the embedding shrinks along the 
edges in low dimensional space and edges of the embedding turn 
into noisy. Fig. 3(c) shows the result got by M-Isomap method 
with neighborhood size = 8. As it can be seen, each of data 
manifold is exactly unrolled, and the inter-manifolds distance is 
precisely preserved. Fig. 3(d) illustrates the initialization step of 
the revised D-C Isomap algorithm. First, two data manifolds X' 
and X^ are identified. Then the third data cluster X^ is constructed, 
where the parameter A = 0.1. Finally, centers Oi and O2 of the 
data manifolds are computed by referring to the nearest neighbors. 
The center of X^ is also the data point X^. Fig. 3(e) shows the 
result of the revised D-C Isomap method. It is can be seen that 
the embedding exactly preserves both the intra-manifold distances 
and inter-manifolds distances. 

Fig. 4(a) is another two-manifolds data set with A' = 1200 data 
points, and the data set is generated by the following matlab code: 

t= [unifrndCpi- 11/12, pi" 14/12, 1, N/2) 

uni£rnd(pi*16/12,pi*19/12, l,N/2)] ; 
xx=t . "cos(tt) ;yy=t . "sin(tt) ; 
zz=unifrnd(l , 25 , 1 ,N) ; 
Y = [xx;zz;yy] ; 

Each data manifold has 600 data points. One data manifold is a 
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(6) 



Fig. 3 

Experiments on two-manifolds data set. (a) The original data set. (b) The result by k-CC Isomar (c) The result by M-Isomar (d) Illustration of the procedure 

OF D-C IsoMAP. (e) The result by D-C Isomar 




Fig. 4 

Experiments on two-manifolds data set. (a) The original data set. (b) The result by k-CC Isomar (c) The result by M-Isomap. (d) Illustration of the procedure 

of D-C Isomar (e) The result by D-C Isomar 



rectangular region and another data manifold is a round region. 
Fig. 4(b) shows the result got by k-CC Isomap with neighborhood 
size k = 10. It can be seen that the rectangular region bent 
outwards and the round region is prolonged. Fig. 4(c) shows 
the result got by M-Isomap method with the neighborhood size 
A: = 8. As it can be seen, every data manifolds is exactly unrolled, 
and the inter-manifolds relationship is precisely preserved. Fig. 
4(d) illustrates the initialization step of the revised D-C Isomap 
algorithm. The parameter A = 0.5 for production of the new 
cluster X^. Fig. 4(e) shows the result of the revised D-C Isomap 
method with neighborhood size k = 5. It is can be seen that the 
embedding exactly preserves both the intra-manifold distances 
and inter-manifolds distances. 



Fig. 5(a) shows a three-manifolds data set with A' = 1600 data 
points on the Swiss roll manifold. The data set is generated by 
the following matlab code: 
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(a) 



; ' x„ 



(b) 



Fig. 5 

Experiments on three-manifolds data set. (a) The original data set. (b) The result by k-CC Isomar (c) The result by M-Isomap. (d) Illustration of the 

PROCEDURE OF D-C IsOMAP. (e) ThE RESULT BY D-C IsOMAR 



tl = [unifrnd(pi*5/6,pi"16/12,l,N/4)] ; 
t2 = [unifrndCpi*18/12,pi*12/6,l,N/4)] ; 
t3=C5"pi/6)*(l+7/5*rand(l,N/2)); 
al=tl.*cos(tl) ; bl=tl. -sinCtl) ; 
cl=[unifrnd(-l,3,l,N/4)] ; 
a2=t2 .*cos(t2) ; b2=t2 . "sin(t2) ; 
c2=[unifrnd(-l,3,l,N/4)] ; 
a3=t3.*cos(t3) ; b3=t3 . "sinCt3) ; 
c3=[uni£rnd(6, 10,1, N/2)] ; 

xl=[al;cl;bl] ; x2=[a2 ; c2 ;b2] ; x3=[a3;c3;b3] 
Z=[xl x2 x3]; 

There are three rectangular regions on the Swiss roll manifold. 
The longest data manifold has 800 data points, the other two 
shorter data manifolds each contain 400 data points. Fig. 5(b) 
shows the result got by k-CC Isomap with neighborhood size 
k = 10. Because of bad approximation of the inter-manifolds 
geodesies, edges of the data manifolds bend outwards. Fig. 
5(c) shows the result got by M-Isomap method, where the 
neighborhood size k is set to be 8. As it can be seen, all 
data manifolds are exactly unrolled, and the inter-manifolds 
relationships of the three data manifolds are faithfully preserved. 
Fig. 5(d) illustrates the initiation step of the revised D-C Isomap 
algorithm. Fig. 5(e) shows the result of the revised D-C Isomap 
method. It is can be seen that the embedding do not exactly 
preserve the inter-manifolds distances. That is because the shape 
of the data manifolds are very narrow. The selected reference data 
points can not efficiently relocate each piece of data manifold. 

B. Real world data sets 

Fig. 6(a) shows some samples of the faces data [38] which 
contains face images of five persons 0. The data set consists of 
153 images and has 34, 35, 26, 24, 34 images corresponding 
to each face. These images are gray scale with resolution of 
112x92. They are transformed into vectors in 10304-dimensional 

' |http://www.cs.toronto.edu/~roweis/data.html| 



Euclidean space. In order to show the inter-manifolds relationship 
with more details, we embed the data into 3-dimensional space. 
Fig. 6(b) is the three dimensional embedding by PCA method. 
It can be observed that data manifolds of faces are mixed up, 
and the intra-face information is also not preserved. Fig. 6(c) is 
the result got by k-CC Isomap method with A: = 3. As it can 
be seen, although the data points are clustered, their inter-face 
distances are not well preserved. The five lines mix up at one 
of their endings. Fig. 6(d) shows the result got by M-Isomap 
method with k = 3. Due to the limitation of k-NN method in 
clustering, only two data manifolds are identified. Although the 
data set is not well clustered, the result of M-Ismap shows that 
the low dimensional embedding can be separated up easily. Fig. 
6(e) is the result got by the original D-C Isomap method, where 
the faces are spit up beforehand. The circumcenters are used as 
their centers. However, as it can be seen, two faces are mixed up. 

Fig. 7(a) shows some samples of the teapot data set with 300 
data points, where 'n' stands for the teapot bird- view images, 'A' 
stands for the teapot back-forth rotation images and 'O' stands for 
the teapot side-view images. Each of the images is an 80x60x3 
RGB colored picture, i.e. a vector in 14400 dimensional input 
space. Because the data points do not distribute on a single global 
manifold, this problem will poses a great challenge to classical 
manifold learning methods. The experiment shows that the three 
data manifolds can be identified by k-CC method. In order to 
show their exact embedding. Fig. 7(b)-(d) present the embedding 
of each data manifold by classical Isomap with neighborhood 
size k = 3. Fig. 7(e) shows the result got by PCA method. It 
can be seen that the data set is clustered, but the shape of each 
embedding is deformed because of the linear characteristic of the 
PCA method. Fig. 7(f) is the result got by k-CC Isomap method 
with neighborhood size k = 2>. The bad approximations of inter- 
manifolds geodesies lead to the deformation of the embedding in 
low dimensional space. Fig. 7(g) shows the result by M-Isomap 
method with neighborhood size k = 3. The data set is clearly 
clustered and intra-manifolds relationships are exactly preserved. 



11 



A 



crrrr 




> 




Fig. 6 

(a) the face data set of five people, (b) the result by PCA. (c) the result by k-CC Isomap (d) the result by M-Isomap. (e) the result by D-C Isomap. 




Fig. 7 

(a) Data points of the teapot data set, '□' stands for teapot vertical view rotation, stands for the teapot side view back-forth rotation and 'O' stand for 

teapot side view rotation, (b) The result of Isomap on the teapot vertical view rotation set. (c) The result of Isomap on the teapot side view back-forth 
rotation set. (d) The result of Isomap on the teapot side view rotation set. (e) The result of PCA on teapot data set. (f) The result of k-CC Isomap on teapot 
data set. (g) The result of M-Isomap on teapot data set. (h) The result of D-C Isomap on teapot data set. 



Fig. 7(h) shows the result got by revised D-C Isomap method 
with neighborhood size A: = 3. It can be seen that the revised 
algorithm produces a satisfying result. 

Fig. 8(a) shows samples of the IsoFACE and teapot rotation 
bird-view data. IsoFACE data consists of 698 images and each 
image is a 64x64 (4096-dimensional) gray scale picture. As 
the input dimension of IsoFACE data set is different from the 
input dimension of teapot data set, we increase the dimension of 
IsoFACE set by adding zeros to the bottom of these vectors. The 
scale of the teapot data set should also be changed such that the 
scales of two embedding can be comparable. Teapot data vectors 
are divided by 100, i.e. the scale of teapot data points shrink 
to of its original ones. Fig. 8(b) is the 3-D embedding of 
IsoFACE by classical Isomap with neighborhood size k=5. Fig. 
8(c) is the scaled teapot 3-D embedding got by classical Isomap 
with neighborhood size k = 5. Fig. 8(d) is the result got by 5-CC 



Isomap method. We can see that the shape of IsoFACE data is 
distorted badly. Fig. 8(e) is the result got by M-Isomap method 
with neighborhood size k=5. The performance is significantly 
improved compared with k-CC Isomap. Fig. 8(f) is the result 
got by the revised D-C Isomap method. 

C. Discussion 

In our experiments, there are several important properties which 
should be considered: 

1) As k-CC Isomap tries to preserve poor and good approx- 
imations of geodesies simultaneously, its low dimensional 
embedding is usually deformed. This method works well if 
each data manifold has comparable number of data points 
and the data manifolds can not be very far from each other, 
and the algorithm does not work well otherwise. 
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Fig. 8 

(a) data points of the IsoFACE and teapot data set, where '•' STANDS FOR THE IsoFACE DATA SET AND 'O' STANDS FOR THE TEAPOT VERTICAL VIEW DATA SET. (b) THE 
RESULT OF ISOMAP ON IsoFACE DATA SET. (c) THE RESULT OF IsOMAP ON TEAPOT VERTICAL VIEW DATA SET. (d) THE RESULT OF K-CC IsOMAP ON IsoFACE AND TEAPOT DATA SET. 
(e) the RESULT OF M-ISOMAP ON IsoFACE AND TEAPOT DATA SET. (f) THE RESULT OF D-C IsOMAP ON IsoFACE AND TEAPOT DATA SET. 



TABLE III 

The GENERALIZATION PERFORMANCE OF CLASSICAL IsOMAP, K-CC IsOMAP. ORIGINAL 
D-C IsOMAP, REVISED D-C IsOMAP AND M-IsOMAP FOR MULTI-MANIFOLDS LEARNING. 



method 


density 


dimensionality 


Isometric 


generalization 


classical 


A 


A 


A 


A 


k-CC 


o 


o 


A 


□ 


Original D-C 


o 


o 


O 


□ 


revised D-C 


o 


o 


O 


O 


Multi- 


o 


o 


O 


o 



2) The revised D-C Isomap overcomes its original limitations, 
meanwhile, the robustness of the algorithm is also enhanced 
by adding a new cluster. 

3) M-Isomap connects data manifolds with multiple edges, 
which can control the rotation of the low dimensional 
embedding, and at the same time, better preserve inter- 
manifolds distance. Like the D-C Isomap algorithm, it can 
also isometrically preserve intra-manifold geodesies and 
inter-manifolds distances. 

To sum up. Table III shows the comparison of the general 
performance of the five versions of Isomap algorithms: classical 
Isomap, k-CC Isomap , Original D-C Isomap, revised D-C Isomap 
and M-Isomap. The labels "A" stands for poor performance, "□" 
stands for not bad and "O" stands for good. Density means the 
generalize ability on manifolds with diff^erent density, i.e. diff^erent 
neighborhood sizes; dimensionality means the generalization abil- 
ity on manifolds with diff^erent intrinsic dimensionality; isometric 
means the property of isometry in preserving the inter and intra- 
manifold relationship; finally the generalization means the overall 
generalization ability to learn data from multiple manifolds. 

VI. Conclusion 

In this paper, the problem of multi-manifolds learning is 
presented and defined for the first time. A general procedure for 



isometric multi-manifolds learning is proposed. The procedure 
can be used to build multi-manifolds learning algorithms which 
are not only able to faithfully preserve intra-manifold geodesic 
distances, but also the inter-manifolds geodesic distances. M- 
Isomap is an implementation of the procedure and shows promis- 
ing results in multi-manifolds learning. Compared with k-CC 
Isomap, it has the advantage of low computational complexity. 
With the procedure, the revised D-C Isomap becomes more 
eff^ective in learning multi-manifolds data sets. Future work will 
be conducted on the applications of the multi-manifolds learning 
algorithms. 
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Abstract — Isometric feature mapping (Isomap) is a promising 
manifold learning method. However, Isomap fails to work on data 
which distribute on clusters in a single manifold or manifolds. 
Many works have been done on extending Isomap to multi- 
manifolds learning. In this paper, we first proposed a new multi- 
manifolds learning algorithm (M-Isomap) with help of a general 
procedure. The new algorithm preserves intra-manifold geodesies 
and multiple inter-manifolds edges precisely. Compared with 
previous methods, this algorithm can isometrically learn data 
distributed on several manifolds. Secondly, the original multi- 
cluster manifold learning algorithm first proposed in [24] and 
called D-C Isomap has been revised so that the revised D-C 
Isomap can learn multi-manifolds data. Finally, the features and 
effectiveness of the proposed multi-manifolds learning algorithms 
are demonstrated and compared through experiments. 

' Index Terms — Isomap, nonlinear dimensionality reduction, 
manifold learning, pattern analysis, multi-manifolds learning. 
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I. Introduction 

' Challenges, known as "the curse of dimensionality", are usually 
fonfronted when scientists are dealing with high dimensional 
data. Dimensionality reduction is a promising tool to circumvent 
these problems. Principal component analysis (PCA) [1] and 
multidimensional scaling (MDS) [2] are two important linear 
dimensionality reduction methods. Due to their linear model 
assumption, both of the methods will fail to discover nonlinear 
intrinsic structures of data. 

Recently, there are more and more interests in nonlinear dimen- 
sionality reduction (NLDR). NLDR is used to learn nonlinear 
intrinsic structures of data, which is considered to be the first 
step of machine learning and pattern recognition [3]. Two 
interesting nonlinear dimensionality reduction methods based on 
the notion of manifold learning [6], isometric feature mapping 
(Isomap) [4] and local linear embedding (LLE) [5], have been 
introduced in SCIENCE 2000. LLE assumes that data points 
locally distribute on a linear patch of a manifold. It preserves local 
linear coefficients, which best reconstruct each data point by its 
neighbors, into a lower dimensional space. Isomap is based on the 
classical MDS method. Instead of preserving pairwise Euclidean 
distance, it preserves the geodesic distance on the manifold. 
The geodesic between two data points is approximated by the 
shortest path on a constructed graph. Both of the methods are 
computationally efficient and able to achieve global optimality. 
There are also many other important nonlinear dimensionality 
reduction methods. Laplacian eigenmap [7] utilizes an approxi- 
mation of the Laplace-Beltrami operator on manifolds to provide 
an optimal embedding. Hessian LLE [8] resembles Laplacian 
eigenmap by using an approximation of the Hessian operator 
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instead of Laplacian operator. The local tangent space alignment 
(LTSA) [9] method learns the local geometry by constructing 
a local tangent space of each data point and then aligns these 
local tangent spaces into a single global coordinates system 
with respect to the underlying manifold. Diffusion maps [10] 
applies diffusion semigroups to produce multi-scale geometries 
to represent complex structures. Riemannian manifold learning 
(RML) [11] method uses the constructed Riemannian normal 
coordinate chart to map the input data into a lower dimensional 
space. NLDR is a fast growing research activity and has been 
proved very useful in many fields and applications, such as 
classification using Isomap [16] and Laplacian eigenmap [17], 
geometry based semi-supervised learning method using Laplacian 
eigenmap [18], data visualization [19] and time series manifold 
learning [20], [21]. 

As Isomap emphasizes on the global geometric relationship of 
data points, it is very illustrative in data visualization and pattern 
analysis [13]. Although Isomap algorithm implicitly requires the 
data set to be convex [8], it still provides very meaningful 
results on non-convex data sets. In this paper, we will focus on 
extending Isomap to multi-manifolds learning. The first step of 
Isomap algorithm is to construct a neighborhood graph which 
connects all the data points. This step is of vital importance 
because the success of the following steps depends on how well 
the constructed neighborhood graph is. However, it is hard to build 
a totally connected neighborhood graph in order to guarantee the 
topological stability of the classical Isomap algorithm when points 
of the data set distribute on clusters in a manifold or manifolds 
(multiple manifolds). It should be remarked that several methods 
have been proposed to extend Isomap to multi-manifolds data, and 
some of them are based on providing new neighborhood graph 
construction algorithms. For example, Wu et al [23] introduced a 
split and augment procedure for neighborhood graph construction 
which could produce a totally connected neighborhood graph. 
In a series of papers [26]-[29], Yang introduced several neigh- 
borhood graph construction algorithms using techniques from 
discrete mathematics, graph theory. Meng et al [24] proposed a 
decomposition and composition Isomap (D-C Isomap). 

The rest of the paper is organized as follows. In Section |lll 
the main issues and limitations of the classical Isomap algorithm 
are presented. The problem of multi-manifolds learning is also 
investigated. In Section [nil previous methods on multi-manifolds 
learning are briefly introduced and discussed. In Section |IV| a 
general procedure for designing multi-manifolds learning algo- 
rithms is first proposed. With the proposed procedure, a new 
multi-manifolds learning algorithm (M-Isomap) is then designed 
and analyzed. In addition, the original D-C Isomap algorithm is 
revised to overcome its main limitation. In Section [V] the effec- 
tiveness of these multi-manifolds learning algorithms has been 
demonstrated by experiments. Comparisons of these algorithms 
have also been made. Some concluding remarks are provided in 
Section rvil 



2 



n. Classical Isometric Feafure Mapping and Its Levhtations 

Isomap is an efficient NLDR algorithm to recover the intrinsic 
geometric structure of a data set if the data points lie on a single 
manifold [4]. Assume that the data set X = {xi,X2, - ■ ■ , JCw} is in 
a high dimensional space and the feature space is R.''. Then 
the classical Isomap algorithm has the following three steps [4]. 

Step 1: Identify the neighbors of all the data points to 

construct a neighborhood graph. With the given parameter k 
or e, there are two ways to construct a neighborhood graph 
forX: 

• if Xj is one of jc,'s k nearest neighbors, they are 
connected by an edge (the k-NN method). 

• if Xi and Xj satisfy \\Xi - Xj\\ < e, they are connected by 
an edge (the £-NN method). 

Step 2: Use Dijkstra's or Floyd-Warshall's algorithm to 
compute the length of the shortest path do^Xi, Xj) between 
any two data points x, and x, on the graph. It is proved that 
dc{Xi,Xj) is a good approximation of the geodesic distance 
duiXhXj) on the manifold as the number of data points 
increases. 

Step 3: Perform the classical MDS on the graph distance 
matrix Dq whose (i, element is ddi, j). Minimize the 
cost fimction 

EiY) = \\T(DG)-T{DY)\\F, 

The operator t is defined as t(D) = -^-HSH, where H = 
1 

/ ee^ with / the identity matrix and e = (1, 1, • • • , 1)^, 

n 

S = (Djj) with Dij being the (i, j)-th element of D and 
Dy = (Wyi-yjW)- Assume that, in descending order, A/ is the 
?-th eigenvalue of t(Dg) with the corresponding eigenvector 
V,. Then the low-dimensional embedding Y is given by: 



The property of the Isomap algorithm is well understood [12], 
[14]. However, the success of the Isomap algorithm depends on 
two issues. One is how to choose the correct intrinsic dimension- 
ality d. Setting a lower dimensionality d will lead to a loss of 
data structure information. On the other hand, setting a higher 
dimensionality d will make some redundant information to be 
kept. This issue has been well investigated so far. The other 
issue is the quality of the constructed neighborhood graph. It is 
known that constructing an appropriate neighborhood graph is 
still a tricky task. Both the k-NN and e-NN methods have their 
limitations. Under the assumption that data points distribute on a 
single manifold, if the neighborhood size A: or s is chosen to be 
too small, the constructed neighborhood graph will be very sparse 
and therefore the geodesies can not be satisfactorily approximated. 
On the other hand, if the neighborhood size or e is chosen to 
be too large, short-circuit edges may occur which will have a 
significant negative influence on the topological stability of the 
Isomap algorithm [22]. 

Nonetheless, if data points distribute uniformly on one mani- 
fold, then both the "short circuit" problem and the "discontinuity" 
problem can be circumvented by carefully choosing an appropri- 
ate neighborhood size k or e. However, if data points distribute on 



several clusters or manifolds, then neither of the k-NN method and 
the e-NN method can guarantee that the whole data set is totally 
connected and the geodesies is satisfactorily approximated. 

However, both data missing and data mixture are common 
problems in practical data analysis. These two cases often cause 
data points to distribute on different clusters in a manifold or 
manifolds. Data points on different manifolds may have different 
input dimensionality D (the dimensionality of the ambient space). 
This usually happens in the case of data mixture. On the other 
hand, learning different data manifolds may need different values 
of input parameters, that is, appropriate neighborhood size (k or b) 
and intrinsic dimensionality d for each data manifold. In this pa- 
per, we will focus on designing new multi-manifolds learning 
algorithms for data distributing on multiple manifolds. The 
case when data points distribute on pieces of a single manifold 
is referred to as multi-cluster manifold learning, while the case 
when data points distribute on multiple manifolds is referred 
to as multi-manifolds learning. 

III. Previous Works on Multi-Manifolds Learning 

A. Multi-manifolds learning by new neighborhood graph con- 
struction method 

Wu and Chan [23] proposed a split-augment approach to 

construct a neighborhood graph. Their method can be regarded as 
a variation of the A:-NN method and can be summarized as below: 

1. the fc-NN method is applied to the data set. Every data 
point is connected with its neighbors. If the data lies on 
multiple manifolds, several disconnected graph components 
(data manifolds) will be formed. 

2. Each couple of graph components are connected by their 
nearest couple of inter-components points. 

This method is simple to implement and has the same compu- 
tational complexity as the fc-NN method has. However, as there is 
only one edge connecting every two graph components, geodesies 
across components are poorly approximated and, meanwhile, 
their low dimensional embedding can be rotated arbitrarily. This 
method can not be directly applied to data lying on three or more 
data manifolds. If more than two graph components exist, the 
intra-component shortest distances may be changed in the totally 
connected graph. 

Yang [26]-[29] introduced four methods to construct a con- 
nected neighborhood graph. The k minimum spanning trees {k- 
MST) method [26] repeatedly extracts k minimum spanning trees 
(MSTs) from the complete Euclidean graph of all data points. 
Edges of the k MSTs form a ^-connected neighborhood graph. 
Instead of extracting k MSTs, the minimum-A:-spanning trees 
(min-Zc-ST) method [27] finds k edge-disjoint spanning trees from 
the complete Euclidean graph, and the sum of the total edge 
length of the k edge-disjoint spanning trees attains a minimum. 
The A:-edge-connected (A:-EC) method [28] constructs a connected 
neighborhood graph by adding edges in a non-increasing order 
from the complete Euclidean graph. An edge is added if its two 
end vertices do not have k edge-disjoint paths connected with each 
other. The A:-vertices-connected (A:-VC) method [29] adds edges in 
a non-increasing order from the complete Euclidean graph, where 
an edge is added if its two end vertices would be disconnected by 
removing some k - 1 vertices. And the constructed neighborhood 
graph would not be disconnected by removing any k-\ vertices. 

The methods introduced in [26]-[29] have the following advan- 
tages over the fc-NN method. First, the local neighbor relationship 
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is affected by the global distribution of data points. This is 
beneficial for adaptively preservation of the global geometric met- 
rics. Secondly, these methods can guarantee that the constructed 
neighborhood graph is totally connected. Compared with the k- 
NN method, Yang's methods construct a neighborhood graph with 
more edges corresponding to the same neighborhood size k. This 
property ensures the quality of the neighborhood graphs. 

B. Multi-manifolds learning by decomposition-composition 
Isomap 

In [24], Meng et al. proposed a decomposition-composition 
method (D-C Isomap) which extends Isomap to multi-cluster 
manifold learning. The purpose of the method is to preserve intra- 
cluster and inter-cluster distances separately. In the next section, 
we will introduce a revised version of the D-C Isomap to extend 
the application range of the original D-C Isomap. To this end, we 
present the details of the D-C Isomap algorithm as follows. 
Step I: Decomposition process 

1. Given an appropriate neighborhood size k or s, if data 
is comprised of multiple clusters, several disconnected 
graph components will be formed when each data point 
is connected with its neighbors by the fc-NN or e-NN 
method. 

2. Assume that there are M graph components and a graph 

component is also considered as a cluster. Data points of 
the m-th cluster is denoted as X'" = {x^, ■ ■ • ,x'^}. Clusters 
and X" are connected by their nearest inter-cluster 
edge, whose ending vertices are assumed as «.jc^ and n.jc^ 
and edge length as „. 

3. Apply the ^-NN Isomap or e-NN Isomap on cluster 
Z"". Denote by D™ = (£>™ ) the geodesic distance matrix 
for X™, by F" = {/[', ,>'"} the corresponding low- 
dimensional embedding to and by wy™ the embedding 
point corresponding to where ny^ e Y"'. 

Step II: Composition process 

1. The set of centers of clusters is denoted as CX = 
{cxi, ■■■ , cxm), where each center is computed by 

cx„ = arg min max(D"') w = 1, • • • , M. 

2. The distance matrix for CX can be computed by 

n-in \ n _ / + + dn.m m*n 
u - iw™), - I 0, m = n 

where £/„, „ is the distance of the shortest path between cx^ 
and ny^ on the graph component X™. 

3. Plug the distance matrix D and the neighborhood size d 
into the classical Isomap algorithm. The embedding of 
CX is denoted by CY = [cyi,--- ,cyM] c R'' (CY is 
called the translation reference set). Assume that the d 
nearest neighbors of cx„, are {cx,„, , • • • , cx,„J. Then the low- 
dimensional representation corresponding to nx'^. can be 
computed as 

SYm, =Cym+ -J) (Q'm, " Q'm), 

^m,mj "t" ayf^ yf^. -t- dfni,ni 

where i = 1, • • • ,d and m = 1, ■ ■ ■ ,M. 

4. Construct the rotation matrix ^„ for 7" with m= 1, • • • ,M. 
Assume that QNm is the principal component matrix of 



NY,„ = {ny^^,--- ,ny'J^J and QS^ is the principal compo- 
nent matrix of SY„ = {i}^!,--- ,symj- Then the rotation 
matrix for Y"' is Jl^ = QS^QN^. 
5. Transform 7" (m = I, - - ,M) into a single coordinate 
systemby Euclidean transformations: 

FY'- = IfyT = m„,yT + cy„,i = I, - ■ ■ ,U, m= ,M. 

Then Y = Uff=i FY'" is the final output. 
First, the D-C Isomap algorithm reduces the dimensionality 
of clusters separately and meanwhile, preserves a skeleton of 
the whole data. Secondly, using the Euclidean transformations, 
the embedding of each cluster is placed into the corresponding 
position by referring to the skeleton. In this way, the intra- 
cluster geodesies are exactly preserved. Since the D-C Isomap 
method uses circumcenters to construct the skeleton of the whole 
data, its learning results depend on the mutual position of these 
circumcenters, which would make the learning results unstable. 
On the other hand, it is known that at least d + I reference points 
are needed to anchor a (/-dimensional simplex. However, in the 
D-C Isomap algorithm, the number of the reference data points 
is limited by the ntunber of clusters. 

C. Constrained Maximum Variance Mapping 

Recently in [25], Li et al. proposed the constrained maximum 
variance mapping (CMVM) algorithm for multi-manifolds learn- 
ing. The CMVM method is proposed based on the notion of 
maximizing the dissimilarity between classes while holding up 
the intra-class similarity. 

IV. Isometric Multi-Manifolds Learning 

In this section, we first introduce a general procedure for the 
design of isometric multi-manifolds learning algorithms and then 
present our new multi-manifolds learning algorithm called M- 
Isomap. Finally, we make a revision of the original D-C Isomap 
algorithm to extend its application range. 

A. The general procedure for isometric multi-manifolds learning 

Many previous methods extend Isomap to multi-manifolds 
learning by revising the neighborhood graph construction step 
of the Isomap algorithm [23], [26]-[29]. However, the shortest 
paths across clusters or data manifolds are bad approximations 
of geodesies. In Isomap, bad local approximation always leads to 
deformation of the global low-dimensional embedding. 

Assumed that Q is an open, convex and compact set in R"* 
and f : Q. —> is a continues mapping, where d « D. Then 
/(D) = is defined as ad dimensional parameterized manifold. 
Let K(x,y) (x,y e M) be a specially defined kernel. Then a 
reproducing kernel Hilbert space (RKHS) is constructed with 
this kernel. Denote by (pj(x) the eigenfunction corresponding to 
the 7-th largest eigenvalue Aj of K(x, y) in J^, which is also the 
j-th element of the Isomap embedding. The geodesic distance on 
the manifold M is written as 

dHx,y) = dHf(T), fir)) = a\\T - t\\ + 77(7, t), 

where r, r e Q, or is a constant and //(t, t) is the deviation from 
isometry. The constant vector 

^_ fj^xp(x)dx _ J^rH{T)dT 
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TABLE I 

Symbols and Variables used in the Algorithms 

X = {jcil^j The total data set, with x,- e 

X'" = {.x"'}^'" The m-th data manifold, where m=l,--- ,M 

Y'" = {y"']j^\ Low dimensional embedding of X'" 

Dmn = (dcix^, Matrix of geodesic distances across data manifolds X'" and X" 

fx^\ fjt^. The furthest couple of data points in X" and X", with /„"' e X"' e X" 

(■^I^o'fci Subset of X'" whose elements are the nearest data points to X" 

7™ = The selected data points from X" to construct the skeleton 7 

Y'" Low-dimensional embedding of I'" with Y'" c Y'" 

I = Um=i Points which are used to construct a skeleton of X 

D/ Approximated geodesic distance matrix for skeleton 7 

RYj Low-dimensional embedding of the skeleton 7 

RY'p Low-dimensional embedding of F" with RY'," c RYj. RY"/ will also 

be referred to as the transformation reference for 7"' 

nx'. The point in X' which is the nearest to X' or the inter-cluster point 

in the D-C Isomap algorithm. 



where p(x) and "Hir) are density functions of A\ and Q.. With 
the above notations, the following theorem is proved by Zha et 
al [12]. 

Theorem 4.1: There is a constant vector Pj such that (pj{x) = 
P^(t — C) + e,(T). Here e;(T) = e*"' - e,(T) has zero mean, that is, 

j 'H{T)ej(r)dT = 0, where 

^/W = ^ r riir,TmT)'^j{x)dT, 
ef = r ej{T)'H(T)dT. 

By Theorem 14.11 even if the deviation ?7(t, t) is not zero 
with only a limited range of (t, t), then the coordinate of the 
low-dimensional embedding (pj(x) is still deformed with the 
deformation being measured by e/r). 

In order to get a better understanding of multi-manifolds data, 
it is profitable to preserve intra-manifold relationship (where 
?7(r, r) = 0) and inter-manifolds relationship (where rj{T, t) 0) 
separately. This is because sometimes we care more about the 
information within the same data manifold. Here we propose 
a general procedure for the design of isometric multi-manifolds 
learning algorithms. 

Step L The decomposition process 

1. Cluster the whole data set. If data distribute on multiple 
clusters in a manifold or manifolds, the clusters or man- 
ifolds should be identified. Many clustering methods can 
be used for this; for example, K- means, Isodata and other 
methods introduced in [15], [33]. Even if the manifolds 
overlay with each other, they can still be identified and 
clustered [39]. At this step, the data set X is clustered into 
several components, and each component is considered as 
a data manifold. 

2. Estimate parameters of data manifolds. For intrinsic di- 
mensionality estimation, many methods can be used: for 
example, the fractal based method [34], the MLE method 
[35], [36] and the incising ball method [37]. Assume that 
dm is the intrinsic dimension of the m-th data manifold. Let 
d = maxd,,,. For the neighborhood size, [32] introduces a 

m 

method on automatically generating parameters for Isomap 



on one single data manifold. For convenience, appropriate 
neighborhood sizes (k„j or e„, for X'") can be given manually 
for data manifolds. 
3. Learn the data manifolds individually. One data manifold 
can be learned by traditional manifold learning algorithms. 
Here, we propose to rebuild a graph on each data manifold 
with a new neighborhood size to better approximate the 
intra-manifold geodesies. In doing so, Yang's methods 
[26]-[29] and theA:-CG graph construction method are 
preferred, where thek-CG graph construction method will 
be described later. It is assumed that the low-dimensional 
embedding for X'" is Y"'. 

Step II: The composition process 

1. Preserve a skeleton 7 of the whole data set in a low- 
dimensional space R''. The skeleton 7 should be carefully 
designed so that it can represent the global structure of X. 
Let RYj be the low-dimensional embedding of 7. 

2. Transform F"'s into a single coordinate system by referring 
to RY]. In order to faithfully preserve the intra-manifold 
relationship, Euclidean transformations can be constructed 
and used. Using the embedding points RY"' c RY' and 
the corresponding points from Y"\ we can construct an 
Euclidean transformation from Y'" to the coordinate system 
of RY,. 

The idea of using a decomposition-composition procedure 
is not new, which was first used by Wu et al. [23] in their 
split-augment process and well developed and used in [24]. 
The procedure we proposed here aims to solve a more general 
problem. Step 1. 1 permits that the designed learning algorithm has 
a good ability to identify data manifolds. Step 1.2 gives a guideline 
on learning manifolds with diff^erent intrinsic dimensionality and 
neighborhood sizes. Step 1.3 learns data manifolds individually so 
that the intra-manifold relationship can be faithfully preserved. 
Step II. 1 is the most flexible part of the procedure which 
allows us to design new isometric multi-manifolds learning 
algorithms. A well designed skeleton 7 can better represent the 
inter-manifolds relationship. In the following subsections, we will 
introduce a new multi-manifolds learning algorithm and revise 
the original D-C Isomap algorithm with the help of this general 
procedure. 
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TABLE II 

Computational complexity comparison of k-NN, k-MSTs, Min-k-ST, k-EC 
and k-vc methods, where tc stands for time complexity and il stands for 
the time complexity for incremental learning 



k-NN 


k-MST 


Ming-k-ST 


k-EC 


k-VC 


TC OikN-) 






0{k-N-) 


0{N') 


IL O(kN) 


0{N\nN) 






0(N\nN + kN) 



B. A new algorithm for isometric multi-manifolds learning 

Based on the proposed procedure, we design a new multi- 
manifolds learning algorithm. As an extension of the classical 
Isomap method to multi-manifolds data, the method will be 
referred to as multi-manifolds Isomap or M-Isomap. It is as- 
sumed that X is also interchangeable to represent the matrix 
[xi,X2,--- ,xn], where jc;, i = \, - ■ ■ ,N are column vectors in 

1 ) Using the k-CC method to construct a neighborhood graph 
and identify manifolds: Table Ull shows the time complexity of the 
k-NN, K-Min-ST, k-EC and k-VC methods on the neighborhood 
graph construction. As shown in the table, the k-NN method has 
the lowest computational complexity O(kN^). 

For incremental learning, the computational complexity of 
the k-NN, k-MSTs and k-VC methods are O(kN), 0(N\nN) 
and 0{NhiN + kN), respectively [30], [31]. The computational 
complexity of the Min-k-ST and k-EC methods for incremen- 
tal learning are unavailable. For data on one single manifold, 
the improvement of performance of Yang's methods becomes 
insignificant when the neighborhood size k increases for the fe-NN 
method. More importantly, the k-NN method implicitly has the 
property of clustering to multi-manifolds data. Data points of the 
same manifold tend to be connected by paths and disconnected 
otherwise when each data point is connected with its neighbors 
by edges. Although the A:-NN method is not a robust clustering 
algorithm, it is computationally efficient for both clustering and 
graph construction. Therefore, we introduce a variation of the k- 
NN method which inherits the computational advantage of the 
k-NN method. The methodis able to identify data manifolds and 
construct a totally connected neighborhood graph. In the rest of 
the paper, the proposed neighborhood graph construction method 
will be referred as the ^-edge connected graph method (the k-CG 
method). 

The summary of the k-CG algorithm is follows. First, given 
a neighborhood size k or s, each data point is connected with 
its neighbors. If the data points distribute on several clusters 
or manifolds, several disconnected graphs will be constructed. 
Data points are assigned to the same data manifold if there is 
a path connecting them on the graphs. Then, we connect each 
pair of graphs by k nearest pairs of data points. For robustness 
of the algorithm, each data point is only allowed to have one 
inter-manifolds edge at most. 

Algorithm 4.1: (The A^-CG algorithm:) 

Input: Euclidean distance matrix D, whose (;', 7')-th entry is \\xi- 
XjW and neighborhood size k or e. 

Output: Graph G = (V, E), the number M of clusters, label of 
the data. 

Initialization: V = {xi, - ■ ■ , x^], V = V, E = Queue = ^ 
1: for i=l to do 



2: Identify nearest neighbors {xn, • • • , xn.] for Xi by A:-nearest- 
neighbors or £-nearest- neighbors. Let E = E(J{eii, - ■ ■ , en.) 

3: end for 

4: Set M = 1 

5: while V is not empty do 

6: xe V, in-Queue{x|, label(x)=M, V = V - {x] 

7: while Queue is not empty do 

8: x=de-Queue 

9: Vy: y is connected with x by an edge 

10: if y is not labeled then 

11: in-Queue{y), label(y)=M, V = V - {y] 

12: end if 

13: end while 

14: M=M-Hl 

15: end while 

16: M=M-1 

17: if M > 2 then 

18: k =average({5, |^^j { where Si is the number of neighbors of 

19: for ! = 1 to M do 

20: for 7 = ! -H 1 to M do 

21: Find k shortest inter-manifolds edges ei,--- ,ek be- 

tween data manifolds and j and make sure that 
their ending vertices are not identical. Let E = 
E\J{ei,--- ,ek\ 

22: end for 

23: end for 

24: end if 

The main difference between the A:-NN method and the k- 
CG method is lines 4 to 25, which identify components (data 
manifolds) and connect different components of the graph. This 
change makes the constructed graph totally fc-edge connected. 
Compared with the method proposed in [23], the k-CG method 
constructs a neighborhood graph with k inter-manifolds edges, 
which is able to control the rotation of the embedding of the data 
manifolds. In Section |Vl the k-CG Isomap method, which uses 
the k-CG method to construct a totally connected graph and then 
perform the classical Isomap on the graph, is compared with the 
M-Isomap method. It can be easily seen that the k-CG Isomap 
suffers the limitation which has been shown by Theorem 14. ll We 
assume that is the subset of X" whose data points 

connect with manifold X"*'*, / = 1, • • • ,k. 

2) Learn data manifolds individually: As X™ is considered as 
a single data manifold in R.^, it is possible to find its intrinsic 
parameters. The incising ball method [37] is utilized to estimate 
the intrinsic dimensionality, which is simple to implement and 
always outputs an integer result. Assume that d is the highest 
intrinsic dimensionality of data manifolds. The neighborhood size 
k^ or £,„ of each data manifold is given manually and the graph 
on the data manifold X'" is rebuilt. It is expected that the new 
neighborhood graph on X'" can give better approximations to 
the intra-manifold geodesies. The approximated geodesic distance 
matrix for X"' is written as D,„. By applying the classical MDS 
on D,„, the low-dimensional embedding for X™ can be obtained 

as y"' = |y;"ifj;. 

3) Preserve a skeleton of the data manifold X: First, inter- 
manifolds distances are computed. Assuming that xP^ and x"^ are 
any data points with x'" e X™ and x" e X" , their distance can be 



6 



computed by 

doi^, xl) = rmnidGix;, -^fw) + + doK^,,,, ^,)}, (1) 

where ddx^, ■"Q;)-' ''^^ shortest path on the neighborhood graph 
of X"\ Although may not be the shortest path on 

the totally connected graph of X, Eq. ([TJ is an efficient way to 
approximate distances across manifolds. Assume that D„,„ is the 
distance matrix across over X'" and X". Then the furthest inter- 
manifolds data points are computed by 



{fx';:, fx';„] = arg max dc(^;\ .^jl d^x'^, x]) e D„ 
Without loss of generality, we may assume 



(2) 



Then / = U„,=i is considered as the global skeleton of X. On 
the data manifold X, it can be seen that the skeleton / formulates 
a sparse graph. We assume that Dj = (dj(i, j)) is the distance 
matrix of /, where 



di{i, j) : 



dc{.x;\x'peD„,„, XieX'",xjeX" 
dG(x'!',x^) e D„„ Xi,XieX"' 



(3) 



By applying the classical MDS algorithm on Dj, the low- 
dimensional embedding RYj of / can be obtained. It is assumed 
that /?FJ" = c RY, is the embedding of /"'. 

4) Euclidean transformations: Assume that ¥'" = iy"')!'"! c F"' 
and yj' corresponds to jc™. Then the Euclidean transformation 
from F™ to RY"' can be constructed as follows. 

The general Euclidean transformation can be written as 

ry = jHy + j3, 

where ^ is an orthonormal matrix and /J is a translation vector. 
For the m-th data manifold, it is assumed that the Euclidean 
transformation is 



ry;' = :n^f,' + fi,n. 



1, 



The above Euclidean transformation can be rewritten in the matrix 
form: 



RY'! 



(4) 



where e is a vector with all ones. Equation ^ can be solved 
using the least square method, and the solution is given by 

-1 



RY',' 



AI 



(5) 



where / is the identity matrix and A is a regularization parameter 
in the singular case. However, the least square solution does not 
necessarily provide an orthonormal matrix We now propose 
to use the QR decomposition to get the orthonormal matrix ^„,. 
The QR process can be written as 



R) = QR{:n,„) 



(6) 



where the diagonal elements of R are forced to be nonnegative. 
Then jS,,, can be recomputed by minimizing the cost function 



C(f}„,) = J]m,„y';'+fi,„-ry';t 



Solving the equation 



dC(fi,„) 
df3„, 



gives 



. i„. 



(7) 



The low-dimensional embeddings Y,„ (i = I,-- - , M) can be 
formed into a global coordinate system using the constructed 
Euclidean transformations. 

5} The M-Isomap algorithm: The detailed M-Isomap algo- 
rithm is summarized in the following table. 



Input: 



W 



X 

size k or e 



jCjij.^j with Xi € '. 



Initial neighborhood 



Step I.l Perform the k-CG algorithm on X. Data 

manifolds {X'"}'^ , and the set of inter-manifolds 
points {jt^J.j)*^j of X'" can be obtained. 

Step 1.2 Estimate parameters of the data manifolds. 

Assume that the intrinsic dimension d,„ and 
neighborhood size (fe,„ or e,„) are parameters for 
X'". Let d = max{d,n] and rebuild the neighbor- 

m 

hood graph on X"'. 
Step 1.3 Classical Isomap algorithm is performed on X'" 
with new neighborhood graph, {m = 1, ■ ■ • , M). 
The corresponding low-dimensional embedding 
of X"' is denoted by F". 
Step II. 1 Inter-manifolds distance matrix D,„„ is computed 
by Eq. (|T]|; thus {fx'"]'^^^ can be found by Eq. 
(O. Distance matrix D/ for the skeleton / is 
computed by Eq. ([S}. Classical MDS is performed 
on Di to obtain the low-dimensional embedding of 
/, which is written as RYi. Assume that RY'I' c RYi 
is the embedding of Z™. 
Construct Euclidean transformations by 
Eqs. ([5]l-(|71i. Use the Euclidean transformations 
to transform Y"' into RY"' , m = 1, ■ • • , M. 



Step II.2 

Step II.3 Y = U",i RY'" is the final output. 

C. Computational complexity of the M-Isomap algorithm 

Computational complexity is a basic issue for application. 
In the M-Isomap method, the k-CC algorithm needs 0{{k + 
l)N^) time to construct a totally connected graph and identify 
the manifolds. It needs OC^ff^^f Sl,lnS m) time to compute the 
shortest path on each data manifold and 0(2^^; 5,"),) time to 
perform the classical MDS on the distance matrices of data 
manifolds. The time complexity of computing the shortest path 
across data manifolds is 0{X,m<n^^ mS n) and that of finding 
fx'., fx', is 0(2m<n ^'m'S,,). Performing the classical MDS on the 
skeleton I needs 0((2m=i linf ) computational time. The time 
complexity of finding the least square solution and processing 
the QR decomposition for M data manifolds is O(Md^). Finally, 
transforming Y'"s into a single coordinate system needs 0(d^N) 
computational time. 

Therefore, the total time complexity of the M-Isomap method 

is 

M M 

0{(k + l)N^ + J]{Sl + Sl InS,,,) + J](k + l)S,nS„ 

m= I m<}i 

M 
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Fig. 1 

Two BASIC CASES OF THE RELATIONSHIP BETWEEN THE CENTER OF EACH CLUSTER OR 
MANIFOLD AND THE INTER-MANIFOLDS POINTS 




Fig. 2 

An ILLUSTRATION OF HOW TO ADD A NEW CLUSTER FOR D-C IsOMAP ALGORITHM. 



where 



For a large data set where N » M and N » d, the overall time 
complexity of the M-Isomap algorithm can be approximated by 

M M 

Oi(k + \)N^ + YjiSi + si \nS„,) + Y,ik + \)S,„Sn). 



D. The revised D-C Isomap method 

D-C Isomap applies the decomposition-composition procedure. 
Therefore, it is able to preserve intra-cluster distances correctly. 
However, this method suffers from several limitations. In the 
following, the original D-C Isomap algorithm will be revised to 
overcome its limitations. 

1 ) Selection of centers: D-C Isomap implicitly assumes that 
the inter-cluster point nx"" is on the line which connects centers 
0„, and Thus it is more sensible that 0,„ is chosen by 

referring to the inter-cluster points. Fig. [T] illustrates two basic 
cases about the relationship of the center and inter-cluster points. 
Although the points nx\, nx^, njCj, nxl, and Oi do not have to 
really lie on the same plane in the ambient space. It is assumed 
that these points formulate a triangle in the low-dimensional 
space. Fig.[T](a) shows the case when /.nx\nx^^nx\ + lnx^^nx\nx^^ < 
180°. 

In the triangle h.O\nx\nx^^, the edge d{nx\,nx\) can be com- 
puted as \\nx\ - nx\\\. We also have 



lOinxfnxl = arccos 



LO\nx[ni?[ 



< nx\ — nx^,nx^^ — nXj > 
||nx[ — wXjIIIInx] — nXjII 

< nx2 — nx^ 



J, — njCj > 



\\nx\ — nxh\\\nx^ — nx' 



Subsequently, the length of edges d(0\,nx^^) and d(0\,nx\) can 
be calculated by the Law of Sines in the triangle l^O\nx\nx^^. 
Suggested distances between the center 0\ to the inter-cluster 
points can be calculated as 

d'(0\,nx\) = d{Oi,nx^} — \\nx^ — nx\\\ 
d'(0\,nx\) = J(Oi, HXj) - llnjCj - nx[||. 

For a cluster with the intrinsic dimension 2, it is sufficient to 
estimate the position of Oi in the cluster by solving the following 
optimization problem: 



Oi = arg min/(o), 

oeX, 



(8) 



f(o) = Y,\\d(Ounxl)-d'{Ounx]) 



Here, d(Oi,nxj) is the length of the shortest path between Oi 
and nx\ on the graph X' . For a cluster with intrinsic dimension 
d„,, at least d^ distances d'(Oi , «x' ), / = 1, • • • , d,„, are needed to 
estimate the position of the center Oi. In this case, /(o) is given 
by 



f(o) = Y,\\d(Ounx])-d'(Ounxj) 



If we can not find sufficiently many distances d'(Oi,nx]) to 
locate the center, there must be many inter-cluster points located 
in space, as illustrated in Fig. [T] (b). In this case, and when the 
center 0\ is never on the line passing through nx^nx^^ and nx\^nx^^, 
we have 

lnx\nx^nx^ > 180°. 



1 2 3 

lnx,nx,nx. 



In order to get a better preservation of the inter-cluster relation- 
ship, 0\ should be placed as far as possible from these inter- 
cluster points. For a cluster with intrinsic dimension 2, it is 
suggested that 0\ should be chosen as 



Oi = arg max{g(o)) 



(9) 



where 



g{o) = d(o, nx\) + d(o, nx^) — \\d{o, nx\) — d{o, nx\)\\ 

If the intrinsic dimension of is d^ and {nx\,i = 1, • • • ,d,„] is 
the set of inter-cluster points in X^, then the function g{o) should 
be given as 

dm 

g(o) = ^ (d{o, nx]) + d(o, wxj) - \\d(p, nx\) - d{o, nx^j)\\j 

•<j 

2) Degenerate and unworkable cases: Since the original D-C 
Isomap algorithm relies on the position of the center of each 
cluster to preserve the inter-cluster relationship, the algorithm 
does not work under certain circumstances. Consider a simple 
case of two data clusters with the intrinsic dimension d = 2. The 
method does not work in this case because it implicitly requires 
an another data cluster to provide sufficient rotation reference data 
points. Since the low-dimensional embedding of each cluster is 
relocated by referring to the position of the center of each cluster. 



In the case when there are three or more clusters and their centers 
are nearly on a line, the original D-C Isomap can not find the exact 
rotation matrix. 

This issue is solved in this paper by adding fictitious clusters. 
The algorithm applies a trial and error procedure to determine the 
position of the fictitious clusters. As an example, we now consider 
the case of two clusters. As shown in Fig. |2l the nearest couple 
of inter-cluster points of the clusters and are assumed to be 
nx\ and «JC|, and nii is the middle point between nx\ and nxy The 
second nearest couple of inter-cluster points are nxl, and njCj , and 
OT2 is the middle point between them. The third fictitious cluster 
is then suggested to be given by 



m2 — mi 
Wm - mil 



where the parameter y can be decided by a trial and error 
procedure. Given a positive value /? > I, X^ is assumed to satisfy 
that 



Input: 



Step I.l 
Step 1.2 



Step 1.3-4 



Step II.1 

Step II.2-4 
Step II.5 



X - 



T7V- 
'(=1 



with Xi e R". Initial neighborhood 



size k or e. 



Same as Step I.l of the original D-C 
Isomap algorithm. 

Estimate the parameters, intrinsic dimension 



and neighborhood sizes {{k, 
), of the clusters. Let d 



=1 

max d,„ 



or 
and 



{dm 

rebuild the neighborhood graph for each 
cluster. 

Same as Step 1.2-3 of the original D-C 
Isomap algorithm. 

Centers of the clusters are computed by ^ or 
([9]l. Fictitious clusters should be added until 
centers of the clusters can anchor a 
d-dimensional simplex. 
Same as Step II.2-4 of the original D-C 
Isomap. Assume that 7™ is transformed 
into TY'". 

Y = U,')f=i TY'" is the final output. 



1 ^ \\X 



■X' 



11X3 



(10) 



V. Experiments 



where \\X^ - X^\\ is the shortest distance between the data points 
from clusters X^ and X^ . If condition dlOt is not satisfied, then y 
can be chosen in a pre-given range such as 



-3,-2, 



-■■44 



In the case when there are M clusters in the data set with 
M < d + I, we can start from the couple of clusters with the 
maximum nearest inter-cluster distance. Assume that X^ and X^ 
satisfy that 



- X^W = maxminlir - 

'■ j 



with nx\, nx2, ffJi, nx^, nx^, m2 being defined as above. Then 
the {M + l)-th cluster X*^"^' can be generated as 



X" 



- mi + y\\nx^ 



m2 — mi 
\\m2 — mil 



If XP and XI are the two nearest clusters of X'^'^^, then, given 
/? > 0, it is assumed that X*'"^"' should satisfy 



1 



■XP\\ 



■xi\\ 



If M+ 1 < d+\, then replace M by M+ 1 and repeat the generating 
procedure presented above. 

PCA can be used to find the dimensionality of the subspace on 
which the centers are lying. If the dimensionality of the subspace 
is smaller than d, then fictitious clusters should be added until 
the centers of the clusters can anchor a J-dimensional simplex. 

3) The revised D-C Isomap algorithm: The revised D-C 
Isomap algorithm can be given as follows. 



A. 3-D data sets 



In this subsection, we compare the k-CC Isomap, the M-Isomap 
and the revised D-C Isomap on three 3-D data sets. It should be 
noted that in all experiments, the neighborhood size is chosen 
corresponding to the best performance of each algorithm. 

Fig.[3](a) is a two-manifolds data set with A' = 1200 data points, 
and the data set is generated by the following matlab code: 
t=(l*pi/6)*Cl+2*rand(l,N)) ; 
xx=t . "COsCt) ;yy=t . "sinCt) ; 

zz =[unifrndCl, 18, l,N/2) unifrnd(16, 25 , 1 ,N/2)] ; 
X=[xx; zz ;yy] ; 

It can be seen that each data manifold is intrinsically a rectangu- 
lar region with 600 data points. Fig.[3tb) shows the result obtained 
by the k-CC Isomap, whose neighborhood graph is constructed by 
using the 8-CC method. It can be seen that the embedding shrinks 
along the edges in the low-dimensional space and the edges of 
the embedding become noisy. Fig. [S^c) shows the result obtained 
by the M-Isomap method with the neighborhood size A: = 8. As 
can be seen, each data manifold is exactly unrolled, and the inter- 
manifolds distance is precisely preserved. Fig. [S^d) illustrates the 
initialization step of the revised D-C Isomap algorithm. First, the 
two data manifolds X' and X^ are identified. Then the third data 
cluster X^ is constructed, where the parameter X = 0.1. Finally, 
the centers Oi and O2 of the data manifolds are computed by 
referring to the nearest neighbors. The center of X^ is also the 
data point X^ . Fig.[3te) shows the result of the revised D-C Isomap 
method. It is seen that the embedding exactly preserves both the 
intra-manifold distances and inter-manifolds distances. 

Fig-Sa) is another two-marufolds data set with N = 1200 data 
points, and the data set is generated by the following matlab code: 

t=[unifrnd(pi*ll/12 ,pi*14/12 , 1 ,N/2) 
unifrnd(pi*16/12,pi*19/12, l,N/2)] ; 

xx=t . *cos(tt) ;yy=t.*sin(tt) ; 

zz=unifrnd(l , 25 , 1,N) ; 

Y = [xx;zz;yy] ; 

Each data manifold has 600 data points. One data manifold is 
a rectangular region and the other one is a round region. Fig. 
|4jb) shows the result obtained by the k-CC Isomap with the 
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Fig. 3 

Experiments on a two-manifolds data set. (a) The original data set. (b) The result obtained by the k-CC Isomap. (c) The result obtained by the M-Isomap. (d) 
Illustration of the procedure of the revised D-C Isomap. (e) The result obtained by the revised D-C Isomap. 




Fig. 4 

Experiments on a two-manifolds data set. (a) The original data set. (b) The result obtained by the k-CC Isomap. (c) The result obtained by the M-Isomap. (d) 
Illustration of the procedure of the revised D-C Isomap. (e) The result obtained by the revised D-C Isomap. 



neighborhood size k = 10. It can be seen that the recteingular 
region bent outwards and the round region is prolonged. Fig. 
|4jc) shows the result obtained by the M-Isomap method with the 
neighborhood size A: = 8. As can be seen, each data manifold is 
exactly unrolled, and the inter-manifolds relationship is precisely 
preserved. Fig. |3d) illustrates the initialization step of the 
revised D-C Isomap algorithm. The parameter A = 0.5 for the 
production of the new cluster X^. Fig. |4le) shows the result 
of the revised D-C Isomap method with the neighborhood size 
/c = 5. It can be seen that the embedding exactly preserves both 
the intra-manifold distances and inter-manifolds distances. 

Fig. [51 a) shows a three-manifolds data set with A' = 1600 data 
points on the Swiss roll manifold. The data set is generated by 



the following matlab code: 

tl = [unifrnd(pi*5/6,pi*16/12,l,N/4)] ; 
t2 = [unifrndCpi*18/12,pi*12/6, l,N/4)] ; 
t3=C5*pi/6)*(l+7/5*rand(l,N/2)) ; 
al=tl.*cos(tl) ; bl=tl.*sinCtl) ; 
cl= [unifrndC- 1 ,3,1, N/4) ] ; 
a2=t2 . ■•■cos(t2) ; b2=t2 . *sinCt2) ; 
c2= [unifrndC- 1,3,1, N/4)] ; 
a3=t3.*cos(t3) ; b3=t3 . '•sinCt3) ; 
c3= [unifrndCe , 18 , 1 , N/2) ] ; 

xl=[al;cl;bl] ; x2=[a2 ; c2 ;b2] ; x3=[a3;c3;b3] 
Z=[xl x2 x3] ; 

There are three rectangular regions on the Swiss roll manifold. 
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(a) 



; ' x„ 



(b) 



Fig. 5 

Experiments on a three-manifolds data set. (a) The original data set. (b) The result obtained by the k-CC Isomap. (c) The result obtained by the M-Isomap. (d) 
Illustration of the procedure of the revised D-C Isomap. (e) The result obtained by the revised D-C Isomap. 



The longest data manifold has 800 data points, and each of the 
other two shorter data manifolds contains 400 data points. Fig. 
|5jb) shows the result obtained by the k-CC Isomap algorithm 
with the neighborhood size k = 10. Due to the bad approximation 
of the inter-manifolds geodesies, edges of the data manifolds 
bend outwards. Fig. [5jc) shows the result obtained by the 
M-Isomap method, where the neighborhood size k is set to be 
8. As can be seen, all data manifolds are exactly unrolled, and 
the inter-manifolds relationships of the three data manifolds are 
precisely preserved. Fig. |5jd) illustrates the initiation step of the 
revised D-C Isomap algorithm. The result of the revised D-C 
Isomap method is presented in Fig. |5je). As seen in Fig. [5je), 
the embedding does not exactly preserve the inter-manifolds 
distances. This is because the shape of the data manifolds are 
very narrow. The selected reference data points can not efficiently 
relocate each piece of the data manifold. 

B. Real world data sets 

Fig. [Sfa) shows samples of the faces data [38] which contains 
face images of five person^. The data set consists of 153 images 
and has 34, 35, 26, 24, 34 images corresponding to each face. 
These images are gray scale with resolution of 112 x 92. They 
are transformed into vectors in a 10304-dimensional Euclidean 
space. In order to show the inter-manifolds relationship with more 
details, the data is embedded into a three-dimensional space. Fig. 
|6jb) is the three-dimensional embedding by the PCA method. 
It can be observed that data manifolds of faces are mixed up, 
and the intra-face information is not preserved. Fig. |6|c) is the 
result obtained by the k-CC Isomap method with k = 3. As seen 
from Fig. [6jc), although the data points are clustered, their inter- 
face distances are not well preserved. The five lines are mixed 
up at one of their endings. Fig. [Sfd) shows the result obtained 
by the M-Isomap method with A: = 3. Due to the limitation 
of the k-NN method in clustering, only two data manifolds are 

'http://www.cs.toronto.edu/ roweis/data.html 



identified. Although the data set is not well clustered, the result 
of the M-Ismap shows that the low-dimensional embedding can 
be separated easily. Fig. |6le) presents the result obtained by 
the original D-C Isomap method, where the faces are split up 
beforehand. The circumcenters are used as their centers. However, 
as can be seen, two faces are mixed up. 

Fig. |7ja) presents samples of the teapot data set with 300 data 
points, where '□' stands for the teapot bird- view images, 'A' 
stands for the teapot back-forth rotation images and 'O' stands 
for the teapot side-view images. Each image is an 80 x 60 x 3 
RGB colored picture, that is, a vector in a 14400-dimensional 
input space. The data points do not distribute on a single global 
manifold, which is a great challenge to the classical manifold 
learning methods. The experiments show that the three data 
manifolds can be identified by the k-CC method. In order to show 
their exact embedding. Fig. |7lb)-(d) present the embedding of 
each data manifold by the classical Isomap with the neighborhood 
size k = 3. Fig.|7le) gives the result obtained by the PCA method. 
It can be seen that the data set is clustered but the shape of 
each embedding is deformed because of the linear characteristic 
of the PCA method. Fig. |7jf) is the result obtained by the k- 
CC Isomap method with the neighborhood size A: = 3. The 
bad approximations of the inter-manifolds geodesies lead to the 
deformation of the embedding in the low-dimensional space. Fig. 
|7Ig) shows the result obtained by the M-Isomap method with the 
neighborhood size k = 3. From Fig.|7lg), it is seen clearly that the 
data set is clearly clustered and the intra-manifolds relationships 
are exactly preserved. Fig. |7jh) gives the result obtained by the 
revised D-C Isomap method with the neighborhood size k = 3. 
It is seen that the revised D-C Isomap algorithm also produces a 
satisfactory result. 

Fig. [SJa) shows samples of the IsoFACE and teapot rotation 
bird-view data set. The IsoFACE data set consists of 698 images 
and each image is a 64x64 (4096-dimensional) gray scale picture. 
Since the input dimension of the IsoFACE data set is diff"erent 
from that of the teapot data set, the dimension of the IsoFACE 
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Fig. 6 

(a) The face data set of five persons, (b) The result by PCA. (c) The result by the k-CC Isomap. (d) The result by the M-Isomap. (e) The result by the original 

D-C Isomap. 




Fig. 7 

(a) The teapot data set. In the experiments, '□' stands for the vertical view rotation of the teapot, stands for the side view back-forth rotation of the 
teapot and 'O' stand for the side view rotation of the teapot, (b) The result of Isomap on the teapot vertical view rotation set. (c) The result of Isomap on 

THE teapot side VIEW BACK-FORTH ROTATION SET. (d) ThE RESULT OF IsOMAP ON THE TEAPOT SIDE VIEW ROTATION SET. (e) ThE RESULT OF PCA ON THE TEAPOT DATA SET. (f) 

The RESULT of the k-CC Isomap on the teapot data set. (g) The result of the M-Isomap on the teapot data set. (h) The result of the revised D-C Isomap on the 

TEAPOT data set. 



set is increased by adding zeros to the bottom of the face image 
vectors. The scale of the teapot data set should also be changed 
such that the scales of the two embeddings is compatible. The 
teapot data vectors are divided by 100, that is, the scale of the 
teapot data points shrinks to of the original one. Fig. [Hb) 
is the 3-D embedding of the IsoFACE data set by the classical 
Isomap with the neighborhood size k = 5. Fig. [8tc) is the 3-D 
embedding of the scaled teapot data set obtained by the classical 
Isomap with the neighborhood size k = 5. Fig. [Sfd) gives the 
result obtained by the 5-CC Isomap method. It can be seen that 
the shape of the IsoFACE data set is distorted badly. Fig. (Sje) 
shows the result obtained by the M-Isomap method with the 
neighborhood size k = 5. The performance of the M-Isomap 
method is significantly improved compared with that of the k- 



CC Isomap. Fig. [§{f) presents the result obtained by the revised 
D-C Isomap method, which is also satisfactory. 

C. Discussion 

In our experiments, there are several important features which 
should be considered: 

1) Since the k-CC Isomap tries to preserve poor and 
good approximations of geodesies simultaneously, its low- 
dimensional embedding is usually deformed. This method 
works well if each data manifold has comparable number of 
data points and the data manifolds can not be very far from 
each other. The algorithm does not work well otherwise. 

2) The revised D-C Isomap overcomes some limitations of 
the original D-C Isomap. Meanwhile, the robustness of 
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Fig. 8 

(a) The IsoFACE and teapot data set, where '•' stands for the IsoFACE data set and 'O' stands for the teapot vertical view data set. (b) The result of Isomap 
ON THE IsoFACE data set. (c) The result of Isomap on the teapot vertical view data set. (d) The result of the k-CC Isomap on the IsoFACE and teapot data set. 
(e) The result of the M-Isomap on the IsoFACE and teapot data set. (f) The result of the revised D-C Isomap on the IsoFACE and teapot data set. 



TABLE III 

The generalization performance of classical Isomap, k-CC Isomap, Original 
D-C Isomap, revised D-C Isomap and M-Isomap for multi-manifolds learning. 



Method Density Dimensionality Isometric Generalization 



classical A A A A 

k-CC O O AD 

Original D-C O O O □ 

revised D-C O O O O 

M-Isomap Q O O O 



VI. Conclusion 

In this paper, the problem of multi-manifolds learning is 
discussed. A general procedure for isometric multi-manifolds 
learning is proposed. The procedure can be used to build multi- 
manifolds learning algorithms which are able to faithfully pre- 
serve not only intra-manifold geodesic distances but also the 
inter-manifolds geodesic distances. The M-Isomap is an imple- 
mentation of the procedure and shows promising results in multi- 
manifolds learning. Compared with the k-CC Isomap which was 
also introduced in this paper based on the general procedure, the 
M-Isomap has the advantage of low computational complexity. 
With the procedure, the D-C Isomap proposed in [24] was revised 
to overcome some of its limitations. Compared with the original 
D-C Isomap, the revised D-C Isomap is more effective in learning 
multi-manifolds data sets. Experiments have also been conducted 
on both synthetic and images data sets to illustrate the efficiency 
of the above multi-manifolds learning algorithms. Future work 
will be conducted on the application of the multi-manifolds 
learning algorithms. 
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